[Mauiusers] Re: Suspended jobs resume execution
Josh Butikofer
josh at clusterresources.com
Wed Jul 5 11:24:00 MDT 2006
Robin,
We are looking at having this fix in by July 19th. Would you be willing to test out the fix when it
is in place?
--
Joshua Butikofer
Cluster Resources, Inc.
josh at clusterresources.com
Voice: (801) 717-3707
Fax: (801) 717-3738
--------------------------
Robin Humble wrote:
> Hi,
>
> On Thu, Apr 27, 2006 at 10:32:40AM -0600, Josh Butikofer wrote:
>> We've confirmed that this behavior is happening in Maui. Moab Workload
>> Manager currently has the desired behavior with suspended jobs accruing
>> priority (and also correctly handles different classes involved). We
>> hope that over the next few weeks we will be able to make these
>> improvements in Maui as well. We will keep the list posted on our progress.
>
> any updates?
>
> in case you were looking for a simpler test case, the below 2 queue
> system seems to have the same behaviour as the previous bug report -
> ie. the suspended PREEMPTEE job has a hard time resuming.
>
> in other words after a PREEMPTOR job steams through (correctly) we end
> up with a previously queued PREEMPTEE job then being chosen to run over
> the top of the suspended PREEMPTEE job.
>
> I don't think this is correct behaviour as only PREEMPTOR jobs should
> be able to run over the top of PREEMPTEE jobs.
>
> versions are:
> torque 2.1.1-3 (rebuild on AS4 i686 from the fc5 .src.rpm), maui 3.2.6p16
>
> relevant part of maui.cfg:
>
> PREEMPTPOLICY SUSPEND
> CLASSCFG[debug] QDEF=high
> CLASSCFG[workq] QDEF=low
> QOSCFG[high] PRIORITY=500 QFLAGS=PREEMPTOR
> QOSCFG[low] PRIORITY=100 QFLAGS=PREEMPTEE
> QOSWEIGHT 1
>
> cheers,
> robin
>
>> --
>> Joshua Butikofer
>> Cluster Resources, Inc.
>>
>> josh at clusterresources.com
>> (801) 798-7488
>> --------------------------
>>
>>
>> David Corredor wrote:
>>> The problem is not just that the suspended job gets once again preempted
>>> by a job of its same class from the IDLE queue, this happens regardless
>>> of the class of the new job.
>>>
>>> Ex. 3 queues (1 verylong, 1 long, 1 fast. Fast preempts long and
>>> verylong, and long preempts verylong, verylong should not preempt).
>>> - Submit 1 long job so that it takes all resources in cluster.
>>> - Submit a verylong job so that it waits in the IDLE queue.
>>> - Submit a fast job.
>>>
>>> The fast job preempts the long one, and once it finishes, instead of the
>>> long one to resume execution, the verylong kicks in and preempts it once
>>> again (and it shouldn't).
>>>
>>>
>>>
>>>
>>>
>>> <quote who="Ronny T. Lampert">
>>>
>>>> .....
>>>> However I experience the very same problem as you do (I need the
>>>> QUEUETIMEWEIGHT set to 1) - the preempted ones stay suspended and instead
>>>> a
>>>> NEW job from the batch queue is started :-(
>>>>
>>>> I think this is a bug: suspended jobs *should age*, too.
>>>> Or automatically get a slightly higher priority than the highest in the
>>>> same
>>>> class to prevent it from staying suspended and interrupted by jobs from
>>>> the
>>>> same class.
>>>>
>>>> Could some developer shortly comment on that issue?
>>>>
>>>> Thanks!
>>>> Ronny
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
More information about the mauiusers
mailing list