[Mauiusers] Maui not scheduling valid jobs when nodes are
available
Josh Butikofer
josh at clusterresources.com
Thu Dec 14 07:22:57 MST 2006
Prakash,
It is not clear to me from this log file why the job's reservation cannot be made. Do you have any
existing reservations on the system? (Use showres to see.) Also, can you increase the loglevel to
see if the logs give more details? (Increase your LOGLEVEL setting to 6 or 7, restart Maui, and try
the test case again.)
Regards,
--
Joshua Butikofer
Cluster Resources, Inc.
josh at clusterresources.com
Voice: (801) 717-3707
Fax: (801) 717-3738
--------------------------
Prakash Velayutham wrote:
> Hello,
>
> I am a recent Maui user (using Torque scheduler before). I have
> Maui-3.2.6-13 with Torque-2.1.6. I have this same setup in 2 different
> clusters. In both the clusters, the Torque server/Maui scheduler (both
> runs on the same server in the 2 setups) is on a 32-bit SuSE 9.3 server.
>
> In one of the setups, everything is working flawlessly.
>
> In the other cluster, I am able to submit jobs like "qsub -l nodes=1
> cpuload.sh".
> But if I change the resource list to something like "qsub -l
> nodes=1:opteron:ppn=2 cpuload.sh", maui does not schedule this job.
>
> Here is some output from the Maui logs.
> ##############################################################################################
> 12/08 10:24:08 MPBSJobLoad(39158,39158.x.y.z,J,TaskList,0)
> 12/08 10:24:08 MReqCreate(39158,SrcRQ,DstRQ,DoCreate)
> 12/08 10:24:08 INFO: processing node request line '1:opteron:ppn=2'
> 12/08 10:24:08 MJobSetCreds(39158,xxx,users,)
> 12/08 10:24:08 INFO: default QOS for job 39158 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/08 10:24:08 INFO: default QOS for job 39158 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/08 10:24:08 INFO: default QOS for job 39158 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/08 10:24:08 INFO: job '39158' loaded: 2 litert users
> 0 Idle 0 1165591448 [NONE] [NONE] [NONE] >= 0 >= 0
> [opteron][xeon][1] 1165591448
> 12/08 10:24:08 INFO: 1 PBS jobs detected on RM FRUCTOSE
> 12/08 10:24:08 INFO: jobs detected: 1
> 12/08 10:24:08 MStatClearUsage(node,Active)
> 12/08 10:24:08 MClusterUpdateNodeState()
> 12/08 10:24:08 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
> 12/08 10:24:08 ERROR: job '39158' has NULL WCLimit field
> 12/08 10:24:08 INFO: job '39158' Priority: 1
> 12/08 10:24:08 INFO: Cred: 0(00.0) FS: 0(00.0)
> Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res:
> 0(00.0) Us: 0(00.0)
> 12/08 10:24:08 MStatClearUsage([NONE],Active)
> 12/08 10:24:08 INFO: total jobs selected (ALL): 1/1
> 12/08 10:24:08 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
> 12/08 10:24:08 ERROR: job '39158' has NULL WCLimit field
> 12/08 10:24:08 INFO: job '39158' Priority: 1
> 12/08 10:24:08 INFO: Cred: 0(00.0) FS: 0(00.0)
> Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res:
> 0(00.0) Us: 0(00.0)
> 12/08 10:24:08 MStatClearUsage([NONE],Idle)
> 12/08 10:24:08 INFO: total jobs selected (ALL): 1/1
> 12/08 10:24:08
> MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
> 12/08 10:24:08 INFO: total jobs selected in partition ALL: 1/1
> 12/08 10:24:08 MQueueScheduleRJobs(Q)
> 12/08 10:24:08
> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 12/08 10:24:08 INFO: total jobs selected in partition ALL: 1/1
> 12/08 10:24:08
> MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
> 12/08 10:24:08 INFO: total jobs selected in partition DEFAULT: 1/1
> 12/08 10:24:08 MQueueScheduleIJobs(Q,DEFAULT)
> 12/08 10:24:08 INFO: 0 feasible tasks found for job 39158:0 in
> partition DEFAULT (2 Needed)
> 12/08 10:24:08 MJobPReserve(39158,DEFAULT,ResCount,ResCountRej)
> 12/08 10:24:08 MJobReserve(39158,Priority)
> 12/08 10:24:08 INFO: 0 feasible tasks found for job 39158:0 in
> partition DEFAULT (2 Needed)
> 12/08 10:24:08 ALERT: job 39158 cannot run in any partition
> 12/08 10:24:08 ALERT: cannot create new reservation for job 39158
> (shape[1] 2)
> 12/08 10:24:08 ALERT: cannot create new reservation for job 39158
> 12/08 10:24:08 MJobSetHold(39158,16,1:00:00,NoResources,cannot create
> reservation for job '39158' (intital reservation attempt))
> 12/08 10:24:08 ALERT: job '39158' cannot run (deferring job for 3600
> seconds)
> 12/08 10:24:08 WARNING: cannot reserve priority job '39158'
> #################################################################################################################
>
> Here is maui.cfg:
> #################################################################################################################
> # maui.cfg 3.2.6p13
>
> SERVERHOST fructose.cchmc.org
> ADMIN1 root
> RMCFG[X] TYPE=PBS HOST=x.y.z PORT=15001 EPORT=15003
> AMCFG[bank] TYPE=NONE
> JOBNODEMATCHPOLICY EXACTNODE
> RMPOLLINTERVAL 00:00:30
> SERVERPORT 42559
> SERVERMODE TEST
> LOGFILE maui.log
> LOGFILEMAXSIZE 10000000
> LOGLEVEL 3
> QUEUETIMEWEIGHT 1
> BACKFILLPOLICY FIRSTFIT
> RESERVATIONPOLICY CURRENTHIGHEST
> NODEALLOCATIONPOLICY MINRESOURCE
> #################################################################################################################
>
> Thanks for any help,
> Prakash
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
More information about the mauiusers
mailing list