Hello,<br><br>I have a 10 node cluster. There are 3 jobs. 1 which needs 2 nodes ( with 1 task per node ), another which needs 4 nodes (with 1 task per node) and the third one which needs 4 nodes ( with 2 task on 1 node and 1 task each on the other 3 nodes ).<br>
<br>Additional configuration in maui.cfg is :<br><br>BACKFILLPOLICY FIRSTFIT<br>RESERVATIONPOLICY CURRENTHIGHEST<br><br>ENABLEMULTIREQJOBS TRUE<br>NODEALLOCATIONPOLICY MINRESOURCE<br>NODEACCESSPOLICY SINGLEJOB<br>
JOBNODEMATCHPOLICY EXACTNODE<br><br>I am observing that if the first 2 jobs are running, the third one does not start ( even though 4 nodes are available ) until 1 of the jobs complete. With checkjob -v <job_id> it shows the following output :<br>
<br>------------------<br><br>checking job 5791 (RM job '5791.fire16.csa.local')<br><br>State: Idle<br>Creds: user:kunal group:kunal class:batch qos:DEFAULT<br>WallTime: 00:00:00 of 00:04:51<br>SubmitTime: Wed May 23 11:52:04<br>
(Time Queued Total: 00:48:52 Eligible: 00:48:52)<br><br>StartDate: 00:00:01 Wed May 23 12:40:57<br>Total Tasks: 2<br><br>Req[0] TaskCount: 2 Partition: ALL<br>Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0<br>
Opsys: [NONE] Arch: [NONE] Features: [NONE]<br>Exec: '' ExecSize: 0 ImageSize: 0<br>Dedicated Resources Per Task: PROCS: 1<br>NodeAccess: SINGLEJOB<br>TasksPerNode: 2 NodeCount: 1<br><br>Req[1] TaskCount: 3 Partition: ALL<br>
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0<br>Opsys: [NONE] Arch: [NONE] Features: [NONE]<br>Exec: '' ExecSize: 0 ImageSize: 0<br>Dedicated Resources Per Task: PROCS: 1<br>NodeAccess: SINGLEJOB<br>
NodeCount: 3<br><br><br>IWD: [NONE] Executable: [NONE]<br>Bypass: 5 StartCount: 0<br>PartitionMask: [ALL]<br>Flags: RESTARTABLE<br><br>Reservation '5791' (00:00:01 -> 00:04:52 Duration: 00:04:51)<br>
PE: 5.00 StartPriority: 48<br>
cannot select job 5791 for partition DEFAULT (startdate in '00:00:01')<br><br>------------<br><br>What could be the reason for not starting this job ? How do I resolve this ?<br><br>Thanks,<br>Kunal<br>