[Moabusers] Invalid job running in SR
Matthew Britt
msbritt at umich.edu
Fri Aug 25 12:33:13 MDT 2006
As an additional note, the mdiag does pick up from torque the user
restrictions:
[root at cac-admin02 log]# mdiag -v -c violi
Class/Queue Status
ClassID Priority Flags QDef QOSList*
PartitionList Target Limits
violi 0 --- --- ---
--- 0.00 ---
REQUIREDUSERLIST=choesh,ckcumaa,fiedler
Thanks,
- matt
On Aug 25, 2006, at 10:09 AM, Matthew Britt wrote:
> We're trying to figure out how certain jobs are running in SRs when
> they aren't supposed to be. We've defined queue-based acls in
> torque to limit which users can submit jobs, the defined SRs which
> require that class. In this case, the queue/class is called violi.
>
> [root at cac-admin02 log]# mdiag -r violi.2254
> Diagnosing Reservations
> RsvID Type Par StartTime EndTime
> Duration Node Task Proc
> ----- ---- --- --------- -------
> -------- ---- ---- ----
> violi.2254 User nyx -1:17:47:48 1:14:18:55
> 3:08:06:43 16 16 32
> Flags: STANDINGRSV,SPACEFLEX,DEDICATEDRESOURCE,ISACTIVE
> ACL: RSV==violi.2254= CLASS==violi+
> CL: RSV==violi.2254
> FLIST=cac
> Task Resources: PROCS: 2
> Active PH: 1996.66/4126.19 (48.39%)
> SRAttributes (TaskCount: 16 StartTime: 00:00:00 EndTime:
> 1:00:00:00 Days: ALL)
> Rsv-Group: violi
>
> Class definition in moab:
> SRCFG[violi] CLASSLIST=violi
> SRCFG[violi] NODEFEATURES=cac
> SRCFG[violi] RESOURCES=PROCS:2
> SRCFG[violi] TASKCOUNT=16
> SRCFG[violi] PERIOD=WEEK
> SRCFG[violi] DEPTH=2
> SRCFG[violi] ACCESS=DEDICATED
> SRCFG[violi] FLAGS=SPACEFLEX,DEDICATEDRESOURCE
>
>
> The job in question was submitted after the SR started, so it
> couldn't have had a reservation prior to the SR being created.
> Job Id: 14978.nyx.engin.umich.edu
> Job_Name = wh8x90y30
> [snip]
> qtime = Thu Aug 24 18:37:29 2006
>
> This problem only seems to happen on SRs that are free-floating.
> We use the cac feature to define nodes we own (as opposed to
> privately owned), so reservations such as these can slide around
> and maintain their taskcounts in the case a node goes down. Our
> node-locked reservations (using either a HOSTLIST or a feature
> limited to a set of privately-held nodes) are not violated.
>
> Any ideas on how to debug why certain jobs are allowed to use these
> reserved nodes or why this would be happening?
>
> Thanks,
> - matt
>
>
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
>
>
More information about the moabusers
mailing list