[Mauiusers] Maui not scheduling valid jobs when nodes are available
Prakash Velayutham
prakash.velayutham at cchmc.org
Fri Dec 8 10:27:55 MST 2006
Hello,
I am a recent Maui user (using Torque scheduler before). I have
Maui-3.2.6-13 with Torque-2.1.6. I have this same setup in 2 different
clusters. In both the clusters, the Torque server/Maui scheduler (both
runs on the same server in the 2 setups) is on a 32-bit SuSE 9.3 server.
In one of the setups, everything is working flawlessly.
In the other cluster, I am able to submit jobs like "qsub -l nodes=1
cpuload.sh".
But if I change the resource list to something like "qsub -l
nodes=1:opteron:ppn=2 cpuload.sh", maui does not schedule this job.
Here is some output from the Maui logs.
##############################################################################################
12/08 10:24:08 MPBSJobLoad(39158,39158.x.y.z,J,TaskList,0)
12/08 10:24:08 MReqCreate(39158,SrcRQ,DstRQ,DoCreate)
12/08 10:24:08 INFO: processing node request line '1:opteron:ppn=2'
12/08 10:24:08 MJobSetCreds(39158,xxx,users,)
12/08 10:24:08 INFO: default QOS for job 39158 set to DEFAULT(0)
(P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/08 10:24:08 INFO: default QOS for job 39158 set to DEFAULT(0)
(P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/08 10:24:08 INFO: default QOS for job 39158 set to DEFAULT(0)
(P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
12/08 10:24:08 INFO: job '39158' loaded: 2 litert users
0 Idle 0 1165591448 [NONE] [NONE] [NONE] >= 0 >= 0
[opteron][xeon][1] 1165591448
12/08 10:24:08 INFO: 1 PBS jobs detected on RM FRUCTOSE
12/08 10:24:08 INFO: jobs detected: 1
12/08 10:24:08 MStatClearUsage(node,Active)
12/08 10:24:08 MClusterUpdateNodeState()
12/08 10:24:08 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
12/08 10:24:08 ERROR: job '39158' has NULL WCLimit field
12/08 10:24:08 INFO: job '39158' Priority: 1
12/08 10:24:08 INFO: Cred: 0(00.0) FS: 0(00.0)
Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res:
0(00.0) Us: 0(00.0)
12/08 10:24:08 MStatClearUsage([NONE],Active)
12/08 10:24:08 INFO: total jobs selected (ALL): 1/1
12/08 10:24:08 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
12/08 10:24:08 ERROR: job '39158' has NULL WCLimit field
12/08 10:24:08 INFO: job '39158' Priority: 1
12/08 10:24:08 INFO: Cred: 0(00.0) FS: 0(00.0)
Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res:
0(00.0) Us: 0(00.0)
12/08 10:24:08 MStatClearUsage([NONE],Idle)
12/08 10:24:08 INFO: total jobs selected (ALL): 1/1
12/08 10:24:08
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
12/08 10:24:08 INFO: total jobs selected in partition ALL: 1/1
12/08 10:24:08 MQueueScheduleRJobs(Q)
12/08 10:24:08
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/08 10:24:08 INFO: total jobs selected in partition ALL: 1/1
12/08 10:24:08
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
12/08 10:24:08 INFO: total jobs selected in partition DEFAULT: 1/1
12/08 10:24:08 MQueueScheduleIJobs(Q,DEFAULT)
12/08 10:24:08 INFO: 0 feasible tasks found for job 39158:0 in
partition DEFAULT (2 Needed)
12/08 10:24:08 MJobPReserve(39158,DEFAULT,ResCount,ResCountRej)
12/08 10:24:08 MJobReserve(39158,Priority)
12/08 10:24:08 INFO: 0 feasible tasks found for job 39158:0 in
partition DEFAULT (2 Needed)
12/08 10:24:08 ALERT: job 39158 cannot run in any partition
12/08 10:24:08 ALERT: cannot create new reservation for job 39158
(shape[1] 2)
12/08 10:24:08 ALERT: cannot create new reservation for job 39158
12/08 10:24:08 MJobSetHold(39158,16,1:00:00,NoResources,cannot create
reservation for job '39158' (intital reservation attempt))
12/08 10:24:08 ALERT: job '39158' cannot run (deferring job for 3600
seconds)
12/08 10:24:08 WARNING: cannot reserve priority job '39158'
#################################################################################################################
Here is maui.cfg:
#################################################################################################################
# maui.cfg 3.2.6p13
SERVERHOST fructose.cchmc.org
ADMIN1 root
RMCFG[X] TYPE=PBS HOST=x.y.z PORT=15001 EPORT=15003
AMCFG[bank] TYPE=NONE
JOBNODEMATCHPOLICY EXACTNODE
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE TEST
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
QUEUETIMEWEIGHT 1
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
NODEALLOCATIONPOLICY MINRESOURCE
#################################################################################################################
Thanks for any help,
Prakash
More information about the mauiusers
mailing list