[torqueusers] Torque 4.0 and job arrays
Ken Nielson
knielson at adaptivecomputing.com
Tue Apr 24 09:07:24 MDT 2012
On Tue, Apr 24, 2012 at 12:23 AM, Rhys Hill <rhys.hill at adelaide.edu.au>wrote:
> Hi David,
>
> Thanks for that. I've just found and fixed some other bugs which I've
> added to
> bugzilla. The one issue that remains is odd. It seems that we have a
> situation
> where an array is stuck, when all of it's component jobs are finished.
>
> For instance, qstat -f says this:
>
> Job Id: 678[].moby.cs.adelaide.edu.au
> Job_Name = YZ_Oxford_group
> Job_Owner = yanzhichen at moby.cs.adelaide.edu.au
> job_state = Q
> queue = batch
> server = moby.cs.adelaide.edu.au
> Checkpoint = u
> ctime = Tue Apr 24 09:26:10 2012
> Error_Path = moby.cs.adelaide.edu.au:
> /home/yanzhichen/moby/oxbuilding_voca
> bulary/out.e.txt
> Hold_Types = n
> Join_Path = n
> Keep_Files = n
> Mail_Points = a
> mtime = Tue Apr 24 09:26:10 2012
> Output_Path = moby.cs.adelaide.edu.au:
> /home/yanzhichen/moby/oxbuilding_voc
> abulary/out.o.txt
> Priority = 0
> qtime = Tue Apr 24 09:26:10 2012
> Rerunable = True
> Resource_List.mem = 5gb
> Resource_List.nodect = 1
> Resource_List.nodes = 1:ppn=1
> Resource_List.pmem = 5gb
> Resource_List.pvmem = 8gb
> Resource_List.walltime = 48:00:00
> etime = Tue Apr 24 09:26:10 2012
> submit_args = -t 2-11 ./job_dogroup
> job_array_request = 2-11
> fault_tolerant = False
> job_radix = 0
> submit_host = moby.cs.adelaide.edu.au
> init_work_dir = /home/yanzhichen/moby/oxbuilding_vocabulary
>
> whereas qstat -ft has no mention of 678[x] at all. qdel and qdel -p have
> no effect
> on jobs like these. I think I've submitted a fix for the problem that
> leads to the
> job getting into this state, but it would be handy if qdel could remove it.
>
> Rhys,
To delete an element of the array or list all of the elements in an array
you need to use the -t option. For example qstat -t will not only list the
array master but every job in the array and its current state.
qdel is the same. You need to use qdel -t to delete an individual job in
the array.
Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120424/2420e2b5/attachment.html
More information about the torqueusers
mailing list