[Moabusers] Major problem with SR
wightman at clusterresources.com
wightman at clusterresources.com
Fri Jul 21 11:23:59 MDT 2006
Although checknode is showing that the reservation started 14 hours ago and the job started only 3 hours ago it is still possible that the reservation moved onto the node after the job started (as Moab will sometimes move reservations without an explicit hostlist). We will try to reproduce this issue locally. Which version of Moab are you currently running?
Thanks,
- Douglas
On Fri, 21 Jul 2006, Brock Palen wrote:
> We have a sr defined like so:
>
> SRCFG[csem] NODEFEATURES=csem
> SRCFG[csem] CLASSLIST=csem
> SRCFG[csem] TASKCOUNT=28
> SRCFG[csem] PERIOD=DAY
> SRCFG[csem] DEPTH=10
> SRCFG[csem] ACCESS=DEDICATED
> SRCFG[csem] FLAGS=IGNSTATE
>
> Problem though is jobs owned by a class other than csem is being placed on
> the nodes, causing those node to not be available. We are pared with
> torque-2.1.1 a check job on one of the jobs that should not be on the nodes
> is below
>
> job 1617
>
> AName: R2_12x36
> State: Running
> Creds: user:hcarlo group:ioe class:short
> WallTime: 2:53:45 of 10:55:00
> SubmitTime: Fri Jul 21 11:14:32
> (Time Queued Total: 00:01:30 Eligible: -00:00:01)
>
> StartTime: Fri Jul 21 11:16:02
> Total Requested Tasks: 1
>
> Req[0] TaskCount: 1 Partition: nyx
> Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: --- Arch: --- Features: ---
>
> Allocated Nodes:
> [nyx337:1]
>
>
> StartCount: 1
> Flags: BACKFILL,RESTARTABLE
> Attr: BACKFILL,checkpoint
> StartPriority: 10451
> Reservation '1617' (-2:56:00 -> 7:59:00 Duration: 10:55:00)
>
>
> Checknode also shows the node as the current active csem reservation was
> active on it before the job started on it!
> Output
>
> checknode nyx337
>
> snip....
> csem.1029x1 User -14:10:55 -> 9:49:05 (1:00:00:00)
> Blocked Resources at -00:00:44 Procs: 4/4 (100.00%) Mem: 0/3901 (0.00%)
> Swap: 0/7710 (0.00%) Disk: 0
> snip...
> 1617x1 Job:Running -2:56:30 -> 7:58:30 (10:55:00)
>
> This job should not be on this node what could be causing this problem?
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
More information about the moabusers
mailing list