[torquedev] Jobs remain in queue after process completion in
Torque 2.2
Steve Snelgrove
ssnelgrove at clusterresources.com
Wed Nov 7 14:10:02 MST 2007
Here is a test example of how to recreate this problem.
Job 57 appears to be stuck in this particular case.
cmd:~$ for i in `seq 1 1000`; do echo sleep 1|qsub -lwalltime=1;done
cmd:~$ qstat -r
makua.cridomain:
Req'd
Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK
Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- ---
------ ----- - -----
57.makua.cridomain ssnelgro batch STDIN 9250 1 --
-- 00:00 R --
122.makua.cridomain ssnelgro batch STDIN 9722 1 --
-- 00:00 R --
147.makua.cridomain ssnelgro batch STDIN 9910 1 --
-- 00:00 R --
151.makua.cridomain ssnelgro batch STDIN 9939 1 --
-- 00:00 R --
251.makua.cridomain ssnelgro batch STDIN 10721 1 --
-- 00:00 R --
252.makua.cridomain ssnelgro batch STDIN 10725 1 --
-- 00:00 R --
cmd:/var/spool/torque$ tracejob 57
Job: 57.makua.cridomain
11/07/2007 11:16:57 S enqueuing into batch, state 1 hop 1
11/07/2007 11:16:57 S Job Queued at request of
ssnelgrove at makua.cridomain, owner =
ssnelgrove at makua.cridomain, job name = STDIN,
queue = batch
11/07/2007 11:16:57 S Job Modified at request of
Scheduler at makua.cridomain
11/07/2007 11:16:57 L Not enough of the right type of nodes available
11/07/2007 11:16:57 A queue=batch
11/07/2007 11:17:10 S Job Modified at request of
Scheduler at makua.cridomain
11/07/2007 11:17:10 L Job Run
11/07/2007 11:17:10 S Job Run at request of Scheduler at makua.cridomain
11/07/2007 11:17:10 A user=ssnelgrove group=ssnelgrove jobname=STDIN
queue=batch ctime=1194459417
qtime=1194459417 etime=1194459417
start=1194459430 owner=ssnelgrove at makua.cridomain
exec_host=makua/4 Resource_List.neednodes=1
Resource_List.nodect=1
Resource_List.nodes=1
Resource_List.walltime=00:00:01
11/07/2007 11:17:11 M scan_for_terminated: job 57.makua.cridomain
task 1 terminated, sid=9250
11/07/2007 11:17:11 M job was terminated
11/07/2007 11:32:22 M EOF? received attempting to process obit reply
11/07/2007 11:32:22 S Exit_status=0 resources_used.cput=00:00:00
resources_used.mem=3112kb
resources_used.vmem=11496kb
resources_used.walltime=00:15:12
11/07/2007 11:32:22 M obit sent to server
11/07/2007 11:32:22 A user=ssnelgrove group=ssnelgrove jobname=STDIN
queue=batch ctime=1194459417
qtime=1194459417 etime=1194459417
start=1194459430 owner=ssnelgrove at makua.cridomain
exec_host=makua/4 Resource_List.neednodes=1
Resource_List.nodect=1
Resource_List.nodes=1
Resource_List.walltime=00:00:01 session=9250 end=1194460342
Exit_status=0 resources_used.cput=00:00:00
resources_used.mem=3112kb
resources_used.vmem=11496kb
resources_used.walltime=00:15:12
11/07/2007 11:37:22 S dequeuing from batch, state COMPLETE
More information about the torquedev
mailing list