[Mauiusers] maui not scheduling even with avaliable resources
Itay M
itaym.tau at gmail.com
Sun Mar 2 13:16:29 MST 2008
what does diagnose -n give you?
any errors?
On Thu, Feb 28, 2008 at 1:47 PM, Arnau Bria <arnau at emergetux.net> wrote:
> Hi,
>
> our maui server (maui-server-3.2.6p19_20.snap.1182974819-4.slc3) is not
> scheduling fine. We have several queues and each one looks for special
> wn resources. For example:
>
> [root at pbs01 sbin]# qmgr -c "l q ifae"|grep resources_default.neednodes
> resources_default.neednodes = ifae
> [root at pbs01 sbin]# qmgr -c "l q gshort"|grep
> resources_default.neednodes resources_default.neednodes = slc4
>
> And we have no ifae WN free and many slc4 slots free:
>
> # pbsnodes -a|grep -B2 ifae|grep -c free
> 0
> # pbsnodes -a|grep -B2 slc4|grep -c free
> 46
>
> So jobs to ifae are not able to run, but jobs to other queues should.
>
>
> The queue looks like:
> IDLE JOBS----------------------
> JOBNAME USERNAME STATE PROC WCLIMIT
> QUEUETIME
>
> 3862162 ops002 Idle 1 3:00:00:00 Thu Feb 28
> 12:09:05 3862186 ops002 Idle 1 3:00:00:00 Thu
> Feb 28 12:11:11 3862201 dteam004 Idle 1
> 1:00:00:00 Thu Feb 28 12:12:36 3862202 dteam004
> Idle 1 1:00:00:00 Thu Feb 28 12:12:38 3862203
> dteam004 Idle 1 1:00:00:00 Thu Feb 28 12:13:28
>
> If we check first job:
> # checkjob 3862162
>
>
> checking job 3862162
>
> State: Idle
> Creds: user:ops002 group:ops class:ifae qos:DEFAULT
> WallTime: 00:00:00 of 3:00:00:00
> SubmitTime: Thu Feb 28 12:09:05
> (Time Queued Total: 00:04:57 Eligible: 00:04:57)
>
> StartDate: -00:04:02 Thu Feb 28 12:10:00
> Total Tasks: 1
>
> Req[0] TaskCount: 1 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [ifae]
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 16 StartCount: 0
> PartitionMask: [ALL]
> Reservation '3862162' (2:14:18:42 -> 5:14:18:42 Duration: 3:00:00:00)
> PE: 1.00 StartPriority: 10000
> job cannot run in partition DEFAULT (idle procs do not meet
> requirements : 0 of 1 procs found) idle procs: 203 feasible procs: 0
>
> Rejection Reasons: [Features : 63][State : 6]
>
> Goes to ifae, it's not able to run.
>
>
> But the first job to gshort queue:
>
> # checkjob 3862206
>
> checking job 3862206
>
> State: Idle
> Creds: user:dteam004 group:dteam class:gshort qos:DEFAULT
> WallTime: 00:00:00 of 1:00:00:00
> SubmitTime: Thu Feb 28 12:13:34
> (Time Queued Total: 00:10:33 Eligible: 00:10:33)
>
> StartDate: -00:10:05 Thu Feb 28 12:14:02
> Total Tasks: 1
>
> Req[0] TaskCount: 1 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [slc4]
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 2 StartCount: 0
> PartitionMask: [ALL]
> PE: 1.00 StartPriority: 10000
> job can run in partition DEFAULT (48 procs available. 1 procs required)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Should run as it has free slots, but keeps idle forever...
>
>
> Our cluster looks like:
>
> But it keeps in idle for long.
> Queue Memory CPU Time Walltime Node Run Que Lm State
> ---------------- ------ -------- -------- ---- --- --- -- -----
> gshort -- 12:00:00 24:00:00 -- 115 106 -- E R
> ifae -- 48:00:00 72:00:00 -- 24 48 -- E R
> ----- -----
> 195 250
>
> and running jobs number begins decreasing its value...
>
>
> So, our solution is setting MAXPROC for ifae at maui.cfg. As we only
> have 24 slots for ifae we set this limit:
>
> CLASSCFG[ifae] MAXPROC=24
>
> restart maui, and tehn:
>
> # qstat -q
>
> server: pbs01.pic.es
>
> Queue Memory CPU Time Walltime Node Run Que Lm State
> ---------------- ------ -------- -------- ---- --- --- -- -----
> gshort -- 12:00:00 24:00:00 -- 163 58 -- E R
> ifae -- 48:00:00 72:00:00 -- 24 48 -- E R
> ----- -----
> 244 202
>
>
> So, our question is, why doesn't maui schedule jobs even there are
> available resources? Why when we set the MAXPROC limit maui starts
> behaving fine ?
>
> Feel free for asking any conf param I forgot to send...
>
>
> TIA,
> Arnau
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20080302/3ac51825/attachment.html
More information about the mauiusers
mailing list