[Moabusers] Major problem with SR

wightman at clusterresources.com wightman at clusterresources.com
Fri Jul 21 11:23:59 MDT 2006


Although checknode is showing that the reservation started 14 hours ago and the job started only 3 hours ago it is still possible that the reservation moved onto the node after the job started (as Moab will sometimes move reservations without an explicit hostlist).  We will try to reproduce this issue locally.  Which version of Moab are you currently running?

Thanks,

- Douglas


On Fri, 21 Jul 2006, Brock Palen wrote:

> We have a sr defined like so:
>
> SRCFG[csem]     NODEFEATURES=csem
> SRCFG[csem]     CLASSLIST=csem
> SRCFG[csem]     TASKCOUNT=28
> SRCFG[csem]     PERIOD=DAY
> SRCFG[csem]     DEPTH=10
> SRCFG[csem]     ACCESS=DEDICATED
> SRCFG[csem]     FLAGS=IGNSTATE
>
> Problem though is jobs owned by a class other than csem is being placed on 
> the nodes,  causing those node to not be available.  We are pared with 
> torque-2.1.1  a check job on one of the jobs that should not be on the nodes 
> is below
>
> job 1617
>
> AName: R2_12x36
> State: Running
> Creds:  user:hcarlo  group:ioe  class:short
> WallTime:   2:53:45 of 10:55:00
> SubmitTime: Fri Jul 21 11:14:32
> (Time Queued  Total: 00:01:30  Eligible: -00:00:01)
>
> StartTime: Fri Jul 21 11:16:02
> Total Requested Tasks: 1
>
> Req[0]  TaskCount: 1  Partition: nyx
> Memory >= 0  Disk >= 0  Swap >= 0
> Opsys:   ---  Arch: ---  Features: ---
>
> Allocated Nodes:
> [nyx337:1]
>
>
> StartCount:     1
> Flags:          BACKFILL,RESTARTABLE
> Attr:           BACKFILL,checkpoint
> StartPriority:  10451
> Reservation '1617' (-2:56:00 -> 7:59:00  Duration: 10:55:00)
>
>
> Checknode also shows the node as the current active csem reservation was 
> active on it before the job started on it!
> Output
>
> checknode nyx337
>
> snip....
> csem.1029x1  User  -14:10:55 -> 9:49:05 (1:00:00:00)
>   Blocked Resources at -00:00:44   Procs: 4/4 (100.00%)  Mem: 0/3901 (0.00%) 
> Swap: 0/7710 (0.00%)  Disk: 0
> snip...
> 1617x1  Job:Running  -2:56:30 -> 7:58:30 (10:55:00)
>
> This job should not be on this node what could be causing this problem?
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers


More information about the moabusers mailing list