[Mauiusers] Problems with Maui + SLURM
Josh Butikofer
josh at clusterresources.com
Tue Nov 28 14:26:07 MST 2006
Could you run Maui under gdb or other debugger to see where Maui is terminating when the job
completes? Set the environment variable MOABDEBUG=YES and then run Maui like so "gdb maui" and
then enter "run" to get the daemon running. If Maui crashes/completes a gdb prompt will
again appear. Use "where" to give a stack trace and e-mail it to us. This will help
us track down the crash.
--
Joshua Butikofer
Cluster Resources, Inc.
josh at clusterresources.com
Voice: (801) 717-3707
Fax: (801) 717-3738
--------------------------
vesor at 163.com wrote:
> I configured two slurm nodes, "node10" with 4 processors and "node7" with 2
> processors. When i submit a job, the job completes but maui also terminates.
> So i have to start maui again if i want to submit another job. :(
> Futhermore, when i use "srun -N2 hostname", it seems only one proc has been
> allocated to the job and the job can't run.
>
> slurm version is 1.1 and maui version is 3.2.6p18
>
> ############################
> [root at node10 root]# srun -n6 hostname
> srun: Warning: Requested partition configuration not available now
> srun: job 6 queued and waiting for resources
> srun: job 6 has been allocated resources
> node7
> node10
> node7
> node10
> node7
> node7
> [root at node10 root]# tail -100 /usr/local/maui/log/maui.log
> 11/28 10:05:51 MResAdjust(NULL,0,0)
> 11/28 10:05:51 MStatInitializeActiveSysUsage()
> 11/28 10:05:51 MStatClearUsage([NONE],Active)
> 11/28 10:05:51 ServerUpdate()
> 11/28 10:05:51 MSysUpdateTime()
> 11/28 10:05:51 INFO: starting iteration 1
> 11/28 10:05:51 MRMGetInfo()
> 11/28 10:05:51 MClusterClearUsage()
> 11/28 10:05:51 MRMClusterQuery()
> 11/28 10:05:51 MWikiClusterLoadInfo(node10,RCount,EMsg,SC)
> 11/28 10:05:51 MWikiDoCommand(node10,7321,9000000,NONE,CMD=GETNODES ARG=0:ALL,Data,DataSize,SC)
> 11/28 10:05:51 MSUSendData(S,9000000,FALSE,FALSE)
> 11/28 10:05:51 INFO: packet sent (31 bytes of 31)
> 11/28 10:05:51 INFO: command sent to server
> 11/28 10:05:51 INFO: message sent: 'CMD=GETNODES ARG=0:ALL'
> 11/28 10:05:51 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:05:51 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:05:51 MSURecvPacket(7,BufP,161,NULL,9000000,SC)
> 11/28 10:05:51 MSUDisconnect(S)
> 11/28 10:05:51 INFO: received node list through WIKI RM
> 11/28 10:05:51 INFO: loading 2 node(s)
> 11/28 10:05:51 MWikiNodeUpdate(AList,node10)
> 11/28 10:05:51 MWikiNodeUpdate(AList,node7)
> 11/28 10:05:51 INFO: 2 WIKI resources detected on RM node10
> 11/28 10:05:51 INFO: resources detected: 2
> 11/28 10:05:51 MRMWorkloadQuery()
> 11/28 10:05:51 MWikiWorkloadQuery(node10,JCount,SC)
> 11/28 10:05:51 MWikiDoCommand(node10,7321,9000000,NONE,CMD=GETJOBS ARG=0:ALL,Data,DataSize,SC)
> 11/28 10:05:51 MSUSendData(S,9000000,FALSE,FALSE)
> 11/28 10:05:51 INFO: packet sent (30 bytes of 30)
> 11/28 10:05:51 INFO: command sent to server
> 11/28 10:05:51 INFO: message sent: 'CMD=GETJOBS ARG=0:ALL'
> 11/28 10:05:51 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:05:51 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:05:51 MSURecvPacket(7,BufP,351,NULL,9000000,SC)
> 11/28 10:05:51 MSUDisconnect(S)
> 11/28 10:05:51 INFO: received job list through WIKI RM
> 11/28 10:05:51 INFO: loading 2 job(s)
> 11/28 10:05:51 MWikiJobLoad(6,UPDATETIME=1164679547;STATE=Idle;WCLIMIT=0;TASKS=1;QUEUETIME=1164679547;UNAME=root;GNAME=root;PARTITIONMASK=test;NODES=1;RMEM=1;RDISK=1;,J,TaskList,node10)
> 11/28 10:05:51 MReqCreate(6,SrcRQ,DstRQ,DoCreate)
> 11/28 10:05:51 MUGetIndex(UPDATETIME=1164679547,ValList,0)
> 11/28 10:05:51 MUGetIndex(STATE=Idle,ValList,0)
> 11/28 10:05:51 MUGetIndex(WCLIMIT=0,ValList,0)
> 11/28 10:05:51 MUGetIndex(TASKS=1,ValList,0)
> 11/28 10:05:51 MUGetIndex(QUEUETIME=1164679547,ValList,0)
> 11/28 10:05:51 MUGetIndex(UNAME=root,ValList,0)
> 11/28 10:05:51 MUGetIndex(GNAME=root,ValList,0)
> 11/28 10:05:51 MUGetIndex(PARTITIONMASK=test,ValList,0)
> 11/28 10:05:51 MUGetIndex(NODES=1,ValList,0)
> 11/28 10:05:51 MUGetIndex(RMEM=1,ValList,0)
> 11/28 10:05:51 MUGetIndex(RDISK=1,ValList,0)
> 11/28 10:05:51 MJobSetCreds(6,root,root,)
> 11/28 10:05:51 INFO: default QOS for job 6 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:05:51 INFO: default QOS for job 6 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:05:51 INFO: job '6' loaded: 1 root root 0 Idle 0 1164679547 [NONE] [NONE] [NONE] >= 1 >= 1 [NONE] 1164679547
> 11/28 10:05:51 INFO: 2 WIKI jobs detected on RM node10
> 11/28 10:05:51 INFO: jobs detected: 2
> 11/28 10:05:51 MStatClearUsage(node,Active)
> 11/28 10:05:51 MClusterUpdateNodeState()
> 11/28 10:05:51 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
> 11/28 10:05:51 ERROR: job '6' has NULL WCLimit field
> 11/28 10:05:51 INFO: job '6' Priority: 1
> 11/28 10:05:51 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
> 11/28 10:05:51 MStatClearUsage([NONE],Active)
> 11/28 10:05:51 INFO: total jobs selected (ALL): 1/1
> 11/28 10:05:51 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
> 11/28 10:05:51 ERROR: job '6' has NULL WCLimit field
> 11/28 10:05:51 INFO: job '6' Priority: 1
> 11/28 10:05:51 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
> 11/28 10:05:51 MStatClearUsage([NONE],Idle)
> 11/28 10:05:51 INFO: total jobs selected (ALL): 1/1
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
> 11/28 10:05:51 INFO: total jobs selected in partition ALL: 1/1
> 11/28 10:05:51 MQueueScheduleRJobs(Q)
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:05:51 INFO: total jobs selected in partition ALL: 1/1
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,ALL,FReason,TRUE)
> 11/28 10:05:51 INFO: job 6 not considered for spanning
> 11/28 10:05:51 INFO: total jobs selected in partition ALL: 0/1 [PartitionAccess: 1]
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,test,FReason,TRUE)
> 11/28 10:05:51 INFO: total jobs selected in partition test: 1/1
> 11/28 10:05:51 MQueueScheduleIJobs(Q,test)
> 11/28 10:05:51 INFO: 6 feasible tasks found for job 6:0 in partition test (1 Needed)
> 11/28 10:05:51 INFO: tasks located for job 6: 1 of 1 required (6 feasible)
> 11/28 10:05:51 MJobStart(6)
> 11/28 10:05:51 MJobDistributeTasks(6,node10,NodeList,TaskMap)
> 11/28 10:05:51 MAMAllocJReserve(6,RIndex,ErrMsg)
> 11/28 10:05:51 MRMJobStart(6,Msg,SC)
> 11/28 10:05:51 MWikiJobStart(6,node10,Msg,SC)
> 11/28 10:05:51 MWikiDoCommand(node10,7321,9000000,NONE,CMD=STARTJOB ARG=6 TASKLIST=node7,Data,DataSize,SC)
> 11/28 10:05:51 MSUSendData(S,9000000,FALSE,FALSE)
> 11/28 10:05:51 INFO: packet sent (42 bytes of 42)
> 11/28 10:05:51 INFO: command sent to server
> 11/28 10:05:51 INFO: message sent: 'CMD=STARTJOB ARG=6 TASKLIST=node7'
> 11/28 10:05:51 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:05:51 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:05:51 MSURecvPacket(7,BufP,77,NULL,9000000,SC)
> 11/28 10:05:51 MSUDisconnect(S)
> 11/28 10:05:51 INFO: job '6' started through WIKI RM on 1 procs
> 11/28 10:05:51 MStatUpdateActiveJobUsage(6)
> ##############################################
> [root at node10 root]# srun -N2 hostname
> srun: Warning: Requested partition configuration not available now
> srun: job 7 queued and waiting for resources
>
> [root at node10 root]# showq
> ACTIVE JOBS--------------------
> JOBNAME USERNAME STATE PROC REMAINING STARTTIME
>
>
> 0 Active Jobs 0 of 6 Processors Active (0.00%)
> 0 of 2 Nodes Active (0.00%)
>
> IDLE JOBS----------------------
> JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
>
> 7 root Idle 1 99:23:59:59 Tue Nov 28 10:09:17
>
> 1 Idle Job
>
> BLOCKED JOBS----------------
> JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
>
>
> Total Jobs: 1 Active Jobs: 0 Idle Jobs: 1 Blocked Jobs: 0
> [root at node10 root]# tail -100 /usr/local/maui/log/maui.log
> 11/28 10:09:43 INFO: message sent: 'CMD=GETJOBS ARG=0:ALL'
> 11/28 10:09:43 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:09:43 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:09:43 MSURecvPacket(7,BufP,395,NULL,9000000,SC)
> 11/28 10:09:43 MSUDisconnect(S)
> 11/28 10:09:43 INFO: received job list through WIKI RM
> 11/28 10:09:43 INFO: loading 2 job(s)
> 11/28 10:09:43 MWikiUpdateJob(AList,7,0)
> 11/28 10:09:43 MUGetIndex(UPDATETIME=1164679757,ValList,0)
> 11/28 10:09:43 MUGetIndex(STATE=Idle,ValList,0)
> 11/28 10:09:43 MUGetIndex(WCLIMIT=0,ValList,0)
> 11/28 10:09:43 MUGetIndex(TASKS=1,ValList,0)
> 11/28 10:09:43 MUGetIndex(QUEUETIME=1164679757,ValList,0)
> 11/28 10:09:43 MUGetIndex(UNAME=root,ValList,0)
> 11/28 10:09:43 MUGetIndex(GNAME=root,ValList,0)
> 11/28 10:09:43 MUGetIndex(PARTITIONMASK=test,ValList,0)
> 11/28 10:09:43 MUGetIndex(NODES=2,ValList,0)
> 11/28 10:09:43 MUGetIndex(RMEM=1,ValList,0)
> 11/28 10:09:43 MUGetIndex(RDISK=1,ValList,0)
> 11/28 10:09:43 INFO: 2 WIKI jobs detected on RM node10
> 11/28 10:09:43 INFO: jobs detected: 2
> 11/28 10:09:43 MStatClearUsage(node,Active)
> 11/28 10:09:43 MClusterUpdateNodeState()
> 11/28 10:09:43 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
> 11/28 10:09:43 INFO: job '7' Priority: 1
> 11/28 10:09:43 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
> 11/28 10:09:43 MStatClearUsage([NONE],Active)
> 11/28 10:09:43 INFO: total jobs selected (ALL): 1/1
> 11/28 10:09:43 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
> 11/28 10:09:43 INFO: job '7' Priority: 1
> 11/28 10:09:43 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
> 11/28 10:09:43 MStatClearUsage([NONE],Idle)
> 11/28 10:09:43 INFO: total jobs selected (ALL): 1/1
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
> 11/28 10:09:43 INFO: total jobs selected in partition ALL: 1/1
> 11/28 10:09:43 MQueueScheduleRJobs(Q)
> 11/28 10:09:43 MResDestroy(7)
> 11/28 10:09:43 MResChargeAllocation(7,2)
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:09:43 INFO: total jobs selected in partition ALL: 1/1
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,ALL,FReason,TRUE)
> 11/28 10:09:43 INFO: job 7 not considered for spanning
> 11/28 10:09:43 INFO: total jobs selected in partition ALL: 0/1 [PartitionAccess: 1]
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,test,FReason,TRUE)
> 11/28 10:09:43 INFO: total jobs selected in partition test: 1/1
> 11/28 10:09:43 MQueueScheduleIJobs(Q,test)
> 11/28 10:09:43 INFO: 6 feasible tasks found for job 7:0 in partition test (1 Needed)
> 11/28 10:09:43 INFO: tasks located for job 7: 2 of 1 required (6 feasible)
> 11/28 10:09:43 MJobStart(7)
> 11/28 10:09:43 MJobDistributeTasks(7,node10,NodeList,TaskMap)
> 11/28 10:09:43 ALERT: inadequate tasks allocated to job
> 11/28 10:09:43 WARNING: cannot distribute allocated tasks for job '7'
> 11/28 10:09:43 ERROR: cannot start job '7' in partition test
> 11/28 10:09:43 MJobPReserve(7,test,ResCount,ResCountRej)
> 11/28 10:09:43 MJobReserve(7,Priority)
> 11/28 10:09:43 INFO: 6 feasible tasks found for job 7:0 in partition test (1 Needed)
> 11/28 10:09:43 INFO: 6 feasible tasks found for job 7:0 in partition test (1 Needed)
> 11/28 10:09:43 INFO: located resources for 1 tasks (6) in best partition test for job 7 at time 00:00:01
> 11/28 10:09:43 INFO: tasks located for job 7: 2 of 1 required (6 feasible)
> 11/28 10:09:43 MJobDistributeTasks(7,node10,NodeList,TaskMap)
> 11/28 10:09:43 ALERT: inadequate tasks allocated to job
> 11/28 10:09:43 MResJCreate(7,MNodeList,00:00:01,Priority,Res)
> 11/28 10:09:43 INFO: job '7' reserved 1 tasks (partition test) to start in 00:00:01 on Tue Nov 28 10:09:44
> (WC: 8639999)
> Active Jobs------
> ------------------
> 11/28 10:09:43 INFO: resources available after scheduling: N: 2 P: 6
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:09:43 INFO: total jobs selected in partition ALL: 1/1
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,ALL,FReason,TRUE)
> 11/28 10:09:43 INFO: job 7 not considered for spanning
> 11/28 10:09:43 INFO: total jobs selected in partition ALL: 0/1 [PartitionAccess: 1]
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,test,FReason,TRUE)
> 11/28 10:09:43 INFO: total jobs selected in partition test: 1/1
> 11/28 10:09:43 MQueueBackFill(BFQueue,HARD,test)
> 11/28 10:09:43 MBFGetWindow(BFNodeCount,BFTaskCount,BFNodeList,BFTime,0,test,[ALL],[ALL],[ALL],'NC 0',1,DRes,NULL,NULL,NULL,NULL)
> 11/28 10:09:43 MJobSetCreds(BFWindow,[ALL],[ALL],[ALL])
> 11/28 10:09:43 INFO: default QOS for job BFWindow set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:09:43 INFO: backfill window: time: INFINITY nodes: 2 tasks: 6 mintime: 0 (idle nodes: 2)
> 11/28 10:09:43 INFO: backfill window obtained [2 nodes/6 procs : INFINITY]
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,2,6,975320217,test,FReason,FALSE)
> 11/28 10:09:43 INFO: total jobs selected in partition test: 1/1
> 11/28 10:09:43 MBFFirstFit(BFQueue,4,BFNodeList,975320217,2,6,test)
> 11/28 10:09:43 INFO: partition test nodes/procs available after MBFFirstFit: 2/6 (0 jobs examined)
> 11/28 10:09:43 MBFGetWindow(BFNodeCount,BFTaskCount,BFNodeList,BFTime,975320217,test,[ALL],[ALL],[ALL],'NC 0',1,DRes,NULL,NULL,NULL,NULL)
> 11/28 10:09:43 MJobSetCreds(BFWindow,[ALL],[ALL],[ALL])
> 11/28 10:09:43 INFO: default QOS for job BFWindow set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:09:43 INFO: backfill window: time: INFINITY nodes: 0 tasks: 0 mintime: 975320217 (idle nodes: 2)
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:09:43 INFO: total jobs selected in partition ALL: 1/1
> 11/28 10:09:43 INFO: job '7' Priority: 1
> 11/28 10:09:43 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 0(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
> 11/28 10:09:43 MSchedUpdateStats()
> 11/28 10:09:43 INFO: iteration: 2 scheduling time: 0.005 seconds
> 11/28 10:09:43 MResUpdateStats()
> 11/28 10:09:43 INFO: current util[2]: 0/2 (0.00%) PH: 0.14% active jobs: 0 of 2 (completed: 0)
> 11/28 10:09:43 MQueueCheckStatus()
> 11/28 10:09:43 MNodeCheckStatus()
> 11/28 10:09:43 MUClearChild(PID)
> 11/28 10:09:43 INFO: scheduling complete. sleeping 15 seconds
> ##################################
> [root at node10 root]# cat /etc/slurm.conf
> # Slurm.conf file generated by configurator.html
> # See the slurm.conf man page for more information
> #
> ControlMachine=node10
> ControlAddr=202.117.10.21
> #BackupController=
> #BackupAddr=
> #
> SlurmUser=slurm
> SlurmctldPort=7010
> SlurmdPort=7011
> AuthType=auth/none
> JobCredentialPrivateKey=/etc/slurm.key
> JobCredentialPublicCertificate=/etc/slurm.cert
> StateSaveLocation=/tmp
> SlurmdSpoolDir=/tmp/slurmd
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd.pid
> ProctrackType=proctrack/linuxproc
> #PluginDir=
> CacheGroups=0
> #FirstJobId=
> ReturnToService=1
> #MaxJobCount=
> #PlugStackConfig=
> #PropagatePrioProcess=
> #PropagateResourceLimits=
> #PropagateResourceLimitsExcept=
> #Prolog=
> #Epilog=
> #SrunProlog=
> #SrunEpilog=
> #TaskProlog=
> #TaskEpilog=
> #TaskPlugin=
> #TmpFs=
> #UsePAM=
> #
> # TIMERS
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=30
> Waittime=0
> #
> # SCHEDULING
> #SchedulerType=sched/backfill
> SchedulerType=sched/wiki
> SchedulerAuth=42
> SchedulerPort=7321
> #SchedulerRootFilter=
> SelectType=select/cons_res
> FastSchedule=0
> #
> # LOGGING
> SlurmctldDebug=5
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=5
> SlurmdLogFile=/var/log/slurmd.log.%h
> JobCompType=jobcomp/filetxt
> JobCompLoc=/var/log/slurm.job.log
> JobAcctType=jobacct/linux
> JobAcctLogfile=/var/log/slurm_jobacct.log
> JobAcctFrequency=30
> #
> # COMPUTE NODES
> NodeName=node10
> #NodeAddr=202.117.10.21
> Procs=2 State=UNKNOWN
> NodeName=node7
> #NodeAddr=202.17.10.18
> Procs=4 State=UNKNOWN
> PartitionName=test Nodes=node[10,7] Default=YES MaxTime=60 State=UP
>
> [root at node10 root]# cat /usr/local/maui/maui.cfg
> # maui.cfg 3.2.6p18
>
> SERVERHOST node10
> # primary admin must be first in list
> ADMIN1 root
>
> # Resource Manager Definition
>
> RMCFG[node10] TYPE=WIKI
> RMPORT 7321 # or whatever you choose as a port
> RMHOST node10
> RMAUTHTYPE[node10] NONE
>
> PARTITIONMODE ON
> NODECFG[node10] PARTITION=test
> NODECFG[node7] PARTITION=test
>
> # Allocation Manager Definition
>
> AMCFG[bank] TYPE=NONE
>
> # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
> # use the 'schedctl -l' command to display current configuration
>
> RMPOLLINTERVAL 00:00:15
>
> SERVERPORT 42559
> SERVERMODE NORMAL
>
> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
>
>
> LOGFILE maui.log
> LOGFILEMAXSIZE 10000000
> LOGLEVEL 3
>
> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
>
> QUEUETIMEWEIGHT 1
>
> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
>
> #FSPOLICY PSDEDICATED
> #FSDEPTH 7
> #FSINTERVAL 86400
> #FSDECAY 0.80
>
> # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
>
> # NONE SPECIFIED
>
> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
>
> BACKFILLPOLICY FIRSTFIT
> RESERVATIONPOLICY CURRENTHIGHEST
>
> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
>
> NODEALLOCATIONPOLICY MINRESOURCE
>
> # QOS: http://supercluster.org/mauidocs/7.3qos.html
>
> # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
>
> # Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html
>
> # SRSTARTTIME[test] 8:00:00
> # SRENDTIME[test] 17:00:00
> # SRDAYS[test] MON TUE WED THU FRI
> # SRTASKCOUNT[test] 20
> # SRMAXTIME[test] 0:30:00
>
> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
>
> # USERCFG[DEFAULT] FSTARGET=25.0
> # USERCFG[john] PRIORITY=100 FSTARGET=10.0-
> # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
> # CLASSCFG[batch] FLAGS=PREEMPTEE
> # CLASSCFG[interactive] FLAGS=PREEMPTOR
>
>
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
More information about the mauiusers
mailing list