[Moabusers] Invalid job running in SR
Matthew Britt
msbritt at umich.edu
Fri Aug 25 08:09:02 MDT 2006
We're trying to figure out how certain jobs are running in SRs when
they aren't supposed to be. We've defined queue-based acls in torque
to limit which users can submit jobs, the defined SRs which require
that class. In this case, the queue/class is called violi.
[root at cac-admin02 log]# mdiag -r violi.2254
Diagnosing Reservations
RsvID Type Par StartTime EndTime
Duration Node Task Proc
----- ---- --- --------- -------
-------- ---- ---- ----
violi.2254 User nyx -1:17:47:48 1:14:18:55
3:08:06:43 16 16 32
Flags: STANDINGRSV,SPACEFLEX,DEDICATEDRESOURCE,ISACTIVE
ACL: RSV==violi.2254= CLASS==violi+
CL: RSV==violi.2254
FLIST=cac
Task Resources: PROCS: 2
Active PH: 1996.66/4126.19 (48.39%)
SRAttributes (TaskCount: 16 StartTime: 00:00:00 EndTime:
1:00:00:00 Days: ALL)
Rsv-Group: violi
Class definition in moab:
SRCFG[violi] CLASSLIST=violi
SRCFG[violi] NODEFEATURES=cac
SRCFG[violi] RESOURCES=PROCS:2
SRCFG[violi] TASKCOUNT=16
SRCFG[violi] PERIOD=WEEK
SRCFG[violi] DEPTH=2
SRCFG[violi] ACCESS=DEDICATED
SRCFG[violi] FLAGS=SPACEFLEX,DEDICATEDRESOURCE
The job in question was submitted after the SR started, so it
couldn't have had a reservation prior to the SR being created.
Job Id: 14978.nyx.engin.umich.edu
Job_Name = wh8x90y30
[snip]
qtime = Thu Aug 24 18:37:29 2006
This problem only seems to happen on SRs that are free-floating. We
use the cac feature to define nodes we own (as opposed to privately
owned), so reservations such as these can slide around and maintain
their taskcounts in the case a node goes down. Our node-locked
reservations (using either a HOSTLIST or a feature limited to a set
of privately-held nodes) are not violated.
Any ideas on how to debug why certain jobs are allowed to use these
reserved nodes or why this would be happening?
Thanks,
- matt
More information about the moabusers
mailing list