[Mauiusers] Suspended jobs not being resumed
Edgar Leon
edgar at mathcs.emory.edu
Tue Apr 15 17:11:34 MDT 2008
Ronny,
Thank you for information that you provided.
> I seem to vaguely remember a problem I had a while ago: suspend jobs
> would not age and as such increase their priority again.
The suspended jobs have been in that state for a week and they were not
resumed even when there were no other jobs in the batch system.
The priority of the suspended jobs did not increase during the last week
as you pointed out.
I tried to manually increase the priority and checkjob showed:
EState 'Running' does not match current state 'Suspended'
Reservation '3372' (-6:05:11:18 -> 93:18:48:41 Duration: 99:23:59:59)
PE: 1.00 StartPriority: 552
cannot select job 3372 for partition DEFAULT (non-idle expected state
'Running')
--------------------------------------------------------------------
# /usr/local/maui/bin/setspri 1000 3372
job system priority adjusted
--------------------------------------------------------------------
EState 'Running' does not match current state 'Suspended'
Reservation '3372' (-6:05:11:49 -> 93:18:48:10 Duration: 99:23:59:59)
PE: 1.00 StartPriority: 1000001000 SystemPriority: 1000
cannot select job 3372 for partition DEFAULT (non-idle expected state
'Running')
--------------------------------------------------------------------
However the job remained in the suspended state and did not run.
I tried to manually force the job to run but it remained suspended:
# /usr/local/maui/bin/runjob -c 3372
INFO: successfully set hostlist for job '3372' to '1'
# /usr/local/maui/bin/runjob -f 3372
job '3372' is in state 'Suspended' (state must be idle)
# /usr/local/maui/bin/runjob -x 3372
job '3372' is in state 'Suspended' (state must be idle)
Is there a command to force a suspended job to run?
The only solution that I found was to restart maui.
qstat showed this state for many days:
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3304.head job0328 eleon 00:04:46 S batch2
3311.head job0328 eleon 00:02:19 S batch2
3335.head job0328 eleon 00:02:19 S batch2
3336.head job0328 eleon 00:02:22 S batch2
3340.head job0328 eleon 00:51:01 S batch2
3345.head job0328 eleon 00:02:17 S batch2
3346.head job0328 eleon 00:02:13 S batch2
3371.head job0328 eleon 00:02:30 S batch2
After restarting maui without making changes to maui.cfg:
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3304.head job0328 eleon 00:04:46 R batch2
3311.head job0328 eleon 00:02:19 R batch2
3335.head job0328 eleon 00:02:19 R batch2
3336.head job0328 eleon 00:02:22 R batch2
3340.head job0328 eleon 00:51:01 R batch2
3345.head job0328 eleon 00:02:17 R batch2
3346.head job0328 eleon 00:02:13 R batch2
3371.head job0328 eleon 00:02:30 R batch2
=========================================================================
> To work-around this you will have to change your config as detailed in
I then modified maui.cfg, restarted maui and these variables are now
enabled:
# /usr/local/maui/bin/showconfig -v | grep USAGE
USAGEWEIGHT[0] 1
USAGEEXECUTIONTIMEWEIGHT[0] 1
The priority of suspended jobs is now increasing.
Thanks for the help.
Edgar
Ronny T. Lampert wrote, On 04/14/08 09:08:
>> Could someone please help me resolve a problem where suspended jobs
>> are not being resumed?
>
> I hope I've understood your problem.
> I seem to vaguely remember a problem I had a while ago: suspend jobs
> would not age and as such increase their priority again.
> So other, non-running but only queued jobs would have a higher priority
> and would run before any suspend jobs.
>
>
> To work-around this you will have to change your config as detailed in
> (here you can find my original problem report)
>
> http://osdir.com/ml/clustering.maui.user/2006-08/msg00021.html
>
>
> Hope this helps,
> Ronny
More information about the mauiusers
mailing list