[Mauiusers] slurm + maui problem.
vesor at 163.com
vesor at 163.com
Fri Dec 1 01:49:04 MST 2006
I use slurm1.1.19 and maui3.2.6p18.
I configured 'node10' with 2 processors and 'node7' with 4.
When use "srun -n6 -t 2 hostname", there is no problem.
But when use "srun -N2 -t 2 hostname", the job can't get enough resources to run.
maui.cfg:
# maui.cfg 3.2.6p18
SERVERHOST node10
ADMIN1 root
RMCFG[node10] TYPE=WIKI
RMPORT 7321 # or whatever you choose as a port
RMHOST node10
RMAUTHTYPE[node10] NONE
PARTITIONMODE ON
NODECFG[node10] PARTITION=test
NODECFG[node7] PARTITION=test
AMCFG[bank] TYPE=NONE
RMPOLLINTERVAL 00:00:15
SERVERPORT 42559
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 7
QUEUETIMEWEIGHT 1
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
NODEALLOCATIONPOLICY MINRESOURCE
######################
[root at node10 root]# showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
0 Active Jobs 0 of 6 Processors Active (0.00%)
0 of 2 Nodes Active (0.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
14 root Idle 1 00:02:00 Fri Dec 1 15:20:48
1 Idle Job
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 1 Active Jobs: 0 Idle Jobs: 1 Blocked Jobs: 0
[root at node10 root]# checkjob 14
checking job 14
State: Idle
Creds: user:root group:root qos:DEFAULT
WallTime: 00:00:00 of 00:02:00
SubmitTime: Fri Dec 1 15:20:48
(Time Queued Total: 00:00:04 Eligible: 00:00:04)
StartDate: 00:00:01 Fri Dec 1 15:20:53
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 1M Disk >= 1M Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
NodeCount: 2
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [test]
Reservation '14' (00:00:01 -> 00:02:01 Duration: 00:02:00)
PE: 1.00 StartPriority: 1
cannot select job 14 for partition test (startdate in '00:00:01')
############################
maui.log:
12/01 15:21:11 INFO: nodelist[0] node10 2 6
12/01 15:21:11 INFO: nodelist[1] node7 4 6
12/01 15:21:11 INFO: ignoring pass 1 for job 14:0 (node set forced in feasible list)
12/01 15:21:11 INFO: evaluating nodes on alloc iteration 0 for job 14:0
12/01 15:21:11 INFO: evaluating nodes on alloc iteration 1 for job 14:0
12/01 15:21:11 INFO: evaluating nodes on alloc iteration 2 for job 14:0
12/01 15:21:11 INFO: evaluating nodes on alloc iteration 3 for job 14:0
12/01 15:21:11 INFO: evaluating nodes on alloc iteration 4 for job 14:0
12/01 15:21:11 INFO: evaluating nodes on alloc iteration 5 for job 14:0
12/01 15:21:11 INFO: tasks located for job 14: 2 of 1 required (6 feasible)
12/01 15:21:11 INFO: allocated MNode[000]x1 'node7' to 14:0
12/01 15:21:11 INFO: allocated MNode[001]x1 'node10' to 14:0
12/01 15:21:11 MJobStart(14)
12/01 15:21:11 MJobDistributeTasks(14,node10,NodeList,TaskMap)
12/01 15:21:11 INFO: 0 node(s)/0 task(s) added to 14:0
12/01 15:21:11 ALERT: inadequate tasks allocated to job
12/01 15:21:11 WARNING: cannot distribute allocated tasks for job '14'
12/01 15:21:11 ERROR: cannot start job '14' in partition test
12/01 15:21:11 MJobSetAttr(14,SysSMinTime,Value,0,3)
12/01 15:21:11 INFO: system min start time set on job 14 for 00:00:01
12/01 15:21:11 MJobPReserve(14,test,ResCount,ResCountRej)
12/01 15:21:11 MJobReserve(14,Priority)
12/01 15:21:11 MPolicyGetEStartTime(14,ALL,SOFT,Time)
12/01 15:21:11 INFO: policy start time found for job 14 in 00:00:01
12/01 15:21:11 MJobGetEStartTime(14,NULL,NodeCount,TaskCount,MNodeList,1164957672)
12/01 15:21:11 MParGetTC(test,Avl,Cfg,Ded,Req,2140000000)
12/01 15:21:11 MJobGetRange(14,RQ,test,00:00:01,GRange,NULL,NodeMap,1,TRange)
12/01 15:21:11 MReqGetFNL(14,0,test,NULL,DstNL,NC,TC,2140000000,0)
12/01 15:21:11 MReqCheckResourceMatch(14,0,node10,NULL)
12/01 15:21:11 INFO: node node10 can provide resources for job 14:0
12/01 15:21:11 MNodeCheckPolicies(14,node10,2)
12/01 15:21:11 MJobCheckNRes(14,node10,RQ[0], INFINITY,TCAvail,1.000,RIndex,NULL,FeasCheck)
12/01 15:21:11 MReqCheckResourceMatch(14,0,node10,RIndex)
12/01 15:21:11 INFO: node node10 can provide resources for job 14:0
12/01 15:21:11 INFO: node node10 added to feasible list (2 tasks)
12/01 15:21:11 MReqCheckResourceMatch(14,0,node7,NULL)
More information about the mauiusers
mailing list