<div>what does diagnose -n give you?</div>
<div>any errors?</div>
<div><br><br> </div>
<div class="gmail_quote">On Thu, Feb 28, 2008 at 1:47 PM, Arnau Bria <<a href="mailto:arnau@emergetux.net">arnau@emergetux.net</a>> wrote:<br>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">Hi,<br><br>our maui server (maui-server-3.2.6p19_20.snap.1182974819-4.slc3) is not<br>scheduling fine. We have several queues and each one looks for special<br>
wn resources. For example:<br><br>[root@pbs01 sbin]# qmgr -c "l q ifae"|grep resources_default.neednodes<br> resources_default.neednodes = ifae<br>[root@pbs01 sbin]# qmgr -c "l q gshort"|grep<br>
resources_default.neednodes resources_default.neednodes = slc4<br><br>And we have no ifae WN free and many slc4 slots free:<br><br># pbsnodes -a|grep -B2 ifae|grep -c free<br>0<br># pbsnodes -a|grep -B2 slc4|grep -c free<br>
46<br><br>So jobs to ifae are not able to run, but jobs to other queues should.<br><br><br>The queue looks like:<br>IDLE JOBS----------------------<br>JOBNAME USERNAME STATE PROC WCLIMIT<br>QUEUETIME<br>
<br>3862162 ops002 Idle 1 3:00:00:00 Thu Feb 28<br>12:09:05 3862186 ops002 Idle 1 3:00:00:00 Thu<br>Feb 28 12:11:11 3862201 dteam004 Idle 1<br>1:00:00:00 Thu Feb 28 12:12:36 3862202 dteam004<br>
Idle 1 1:00:00:00 Thu Feb 28 12:12:38 3862203<br>dteam004 Idle 1 1:00:00:00 Thu Feb 28 12:13:28<br><br>If we check first job:<br># checkjob 3862162<br><br><br>checking job 3862162<br><br>State: Idle<br>Creds: user:ops002 group:ops class:ifae qos:DEFAULT<br>
WallTime: 00:00:00 of 3:00:00:00<br>SubmitTime: Thu Feb 28 12:09:05<br> (Time Queued Total: 00:04:57 Eligible: 00:04:57)<br><br>StartDate: -00:04:02 Thu Feb 28 12:10:00<br>Total Tasks: 1<br><br>Req[0] TaskCount: 1 Partition: ALL<br>
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0<br>Opsys: [NONE] Arch: [NONE] Features: [ifae]<br><br><br>IWD: [NONE] Executable: [NONE]<br>Bypass: 16 StartCount: 0<br>PartitionMask: [ALL]<br>Reservation '3862162' (2:14:18:42 -> 5:14:18:42 Duration: 3:00:00:00)<br>
PE: 1.00 StartPriority: 10000<br>job cannot run in partition DEFAULT (idle procs do not meet<br>requirements : 0 of 1 procs found) idle procs: 203 feasible procs: 0<br><br>Rejection Reasons: [Features : 63][State : 6]<br>
<br>Goes to ifae, it's not able to run.<br><br><br>But the first job to gshort queue:<br><br># checkjob 3862206<br><br>checking job 3862206<br><br>State: Idle<br>Creds: user:dteam004 group:dteam class:gshort qos:DEFAULT<br>
WallTime: 00:00:00 of 1:00:00:00<br>SubmitTime: Thu Feb 28 12:13:34<br> (Time Queued Total: 00:10:33 Eligible: 00:10:33)<br><br>StartDate: -00:10:05 Thu Feb 28 12:14:02<br>Total Tasks: 1<br><br>Req[0] TaskCount: 1 Partition: ALL<br>
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0<br>Opsys: [NONE] Arch: [NONE] Features: [slc4]<br><br><br>IWD: [NONE] Executable: [NONE]<br>Bypass: 2 StartCount: 0<br>PartitionMask: [ALL]<br>PE: 1.00 StartPriority: 10000<br>
job can run in partition DEFAULT (48 procs available. 1 procs required)<br>^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^<br><br>Should run as it has free slots, but keeps idle forever...<br><br>
<br>Our cluster looks like:<br><br>But it keeps in idle for long.<br>Queue Memory CPU Time Walltime Node Run Que Lm State<br>---------------- ------ -------- -------- ---- --- --- -- -----<br>gshort -- 12:00:00 24:00:00 -- 115 106 -- E R<br>
ifae -- 48:00:00 72:00:00 -- 24 48 -- E R<br> ----- -----<br> 195 250<br><br>and running jobs number begins decreasing its value...<br>
<br><br>So, our solution is setting MAXPROC for ifae at maui.cfg. As we only<br>have 24 slots for ifae we set this limit:<br><br>CLASSCFG[ifae] MAXPROC=24<br><br>restart maui, and tehn:<br><br># qstat -q<br><br>server: <a href="http://pbs01.pic.es/" target="_blank">pbs01.pic.es</a><br>
<br>Queue Memory CPU Time Walltime Node Run Que Lm State<br>---------------- ------ -------- -------- ---- --- --- -- -----<br>gshort -- 12:00:00 24:00:00 -- 163 58 -- E R<br>ifae -- 48:00:00 72:00:00 -- 24 48 -- E R<br>
----- -----<br> 244 202<br><br><br>So, our question is, why doesn't maui schedule jobs even there are<br>available resources? Why when we set the MAXPROC limit maui starts<br>
behaving fine ?<br><br>Feel free for asking any conf param I forgot to send...<br><br><br>TIA,<br>Arnau<br>_______________________________________________<br>mauiusers mailing list<br><a href="mailto:mauiusers@supercluster.org">mauiusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/mauiusers" target="_blank">http://www.supercluster.org/mailman/listinfo/mauiusers</a><br></blockquote></div><br>