[Mauiusers] Re: Suspended jobs resume execution

Josh Butikofer josh at clusterresources.com
Wed Jul 5 11:24:00 MDT 2006


Robin,

We are looking at having this fix in by July 19th. Would you be willing to test out the fix when it 
is in place?

-- 
Joshua Butikofer
Cluster Resources, Inc.

josh at clusterresources.com
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


Robin Humble wrote:
> Hi,
> 
> On Thu, Apr 27, 2006 at 10:32:40AM -0600, Josh Butikofer wrote:
>> We've confirmed that this behavior is happening in Maui. Moab Workload 
>> Manager currently has the desired behavior with suspended jobs accruing 
>> priority (and also correctly handles different classes involved). We 
>> hope that over the next few weeks we will be able to make these 
>> improvements in Maui as well. We will keep the list posted on our progress.
> 
> any updates?
> 
> in case you were looking for a simpler test case, the below 2 queue
> system seems to have the same behaviour as the previous bug report -
> ie. the suspended PREEMPTEE job has a hard time resuming.
> 
> in other words after a PREEMPTOR job steams through (correctly) we end
> up with a previously queued PREEMPTEE job then being chosen to run over
> the top of the suspended PREEMPTEE job.
> 
> I don't think this is correct behaviour as only PREEMPTOR jobs should
> be able to run over the top of PREEMPTEE jobs.
> 
> versions are:
> torque 2.1.1-3 (rebuild on AS4 i686 from the fc5 .src.rpm), maui 3.2.6p16
> 
> relevant part of maui.cfg:
> 
> PREEMPTPOLICY SUSPEND
> CLASSCFG[debug]      QDEF=high
> CLASSCFG[workq]      QDEF=low
> QOSCFG[high] PRIORITY=500 QFLAGS=PREEMPTOR
> QOSCFG[low] PRIORITY=100 QFLAGS=PREEMPTEE
> QOSWEIGHT       1
> 
> cheers,
> robin
> 
>> -- 
>> Joshua Butikofer
>> Cluster Resources, Inc.
>>
>> josh at clusterresources.com
>> (801) 798-7488
>> --------------------------
>>
>>
>> David Corredor wrote:
>>> The problem is not just that the suspended job gets once again preempted
>>> by a job of its same class from the IDLE queue, this happens regardless
>>> of the class of the new job.
>>>
>>>  Ex.  3 queues (1 verylong, 1 long, 1 fast.  Fast preempts long and
>>> verylong, and long preempts verylong, verylong should not preempt).
>>>    - Submit 1 long job so that it takes all resources in cluster.
>>>    - Submit a verylong job so that it waits in the IDLE queue.
>>>    - Submit a fast job.
>>>
>>>  The fast job preempts the long one, and once it finishes, instead of the
>>> long one to resume execution, the verylong kicks in and preempts it once
>>> again (and it shouldn't).
>>>
>>>
>>>
>>>
>>>
>>> <quote who="Ronny T. Lampert">
>>>
>>>> .....
>>>> However I experience the very same problem as you do (I need the
>>>> QUEUETIMEWEIGHT set to 1) - the preempted ones stay suspended and instead
>>>> a
>>>> NEW job from the batch queue is started :-(
>>>>
>>>> I think this is a bug: suspended jobs *should age*, too.
>>>> Or automatically get a slightly higher priority than the highest in the
>>>> same
>>>> class to prevent it from staying suspended and interrupted by jobs from
>>>> the
>>>> same class.
>>>>
>>>> Could some developer shortly comment on that issue?
>>>>
>>>> Thanks!
>>>> Ronny
>>>>
>>>>
>>>
>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers


More information about the mauiusers mailing list