[Mauiusers] maui not scheduling even with avaliable resources
Arnau Bria
arnau at emergetux.net
Thu Feb 28 04:47:54 MST 2008
Hi,
our maui server (maui-server-3.2.6p19_20.snap.1182974819-4.slc3) is not
scheduling fine. We have several queues and each one looks for special
wn resources. For example:
[root at pbs01 sbin]# qmgr -c "l q ifae"|grep resources_default.neednodes
resources_default.neednodes = ifae
[root at pbs01 sbin]# qmgr -c "l q gshort"|grep
resources_default.neednodes resources_default.neednodes = slc4
And we have no ifae WN free and many slc4 slots free:
# pbsnodes -a|grep -B2 ifae|grep -c free
0
# pbsnodes -a|grep -B2 slc4|grep -c free
46
So jobs to ifae are not able to run, but jobs to other queues should.
The queue looks like:
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT
QUEUETIME
3862162 ops002 Idle 1 3:00:00:00 Thu Feb 28
12:09:05 3862186 ops002 Idle 1 3:00:00:00 Thu
Feb 28 12:11:11 3862201 dteam004 Idle 1
1:00:00:00 Thu Feb 28 12:12:36 3862202 dteam004
Idle 1 1:00:00:00 Thu Feb 28 12:12:38 3862203
dteam004 Idle 1 1:00:00:00 Thu Feb 28 12:13:28
If we check first job:
# checkjob 3862162
checking job 3862162
State: Idle
Creds: user:ops002 group:ops class:ifae qos:DEFAULT
WallTime: 00:00:00 of 3:00:00:00
SubmitTime: Thu Feb 28 12:09:05
(Time Queued Total: 00:04:57 Eligible: 00:04:57)
StartDate: -00:04:02 Thu Feb 28 12:10:00
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [ifae]
IWD: [NONE] Executable: [NONE]
Bypass: 16 StartCount: 0
PartitionMask: [ALL]
Reservation '3862162' (2:14:18:42 -> 5:14:18:42 Duration: 3:00:00:00)
PE: 1.00 StartPriority: 10000
job cannot run in partition DEFAULT (idle procs do not meet
requirements : 0 of 1 procs found) idle procs: 203 feasible procs: 0
Rejection Reasons: [Features : 63][State : 6]
Goes to ifae, it's not able to run.
But the first job to gshort queue:
# checkjob 3862206
checking job 3862206
State: Idle
Creds: user:dteam004 group:dteam class:gshort qos:DEFAULT
WallTime: 00:00:00 of 1:00:00:00
SubmitTime: Thu Feb 28 12:13:34
(Time Queued Total: 00:10:33 Eligible: 00:10:33)
StartDate: -00:10:05 Thu Feb 28 12:14:02
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [slc4]
IWD: [NONE] Executable: [NONE]
Bypass: 2 StartCount: 0
PartitionMask: [ALL]
PE: 1.00 StartPriority: 10000
job can run in partition DEFAULT (48 procs available. 1 procs required)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Should run as it has free slots, but keeps idle forever...
Our cluster looks like:
But it keeps in idle for long.
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
gshort -- 12:00:00 24:00:00 -- 115 106 -- E R
ifae -- 48:00:00 72:00:00 -- 24 48 -- E R
----- -----
195 250
and running jobs number begins decreasing its value...
So, our solution is setting MAXPROC for ifae at maui.cfg. As we only
have 24 slots for ifae we set this limit:
CLASSCFG[ifae] MAXPROC=24
restart maui, and tehn:
# qstat -q
server: pbs01.pic.es
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
gshort -- 12:00:00 24:00:00 -- 163 58 -- E R
ifae -- 48:00:00 72:00:00 -- 24 48 -- E R
----- -----
244 202
So, our question is, why doesn't maui schedule jobs even there are
available resources? Why when we set the MAXPROC limit maui starts
behaving fine ?
Feel free for asking any conf param I forgot to send...
TIA,
Arnau
More information about the mauiusers
mailing list