[Mauiusers] Torque/Maui one queue multiple QoS levels
James A. Peltier
jpeltier at cs.sfu.ca
Tue Apr 14 12:29:40 MDT 2009
Hi All,
I'm trying to implement a single queue in torque with multiple QoS levels in
maui so that I can manage priorities and such in one place, Maui. The idea is
that jobs submitted specify the -l qos=<level> for short,long,high,low,normal
jobs. Below is my maui.cfg file and details for a job submitted with qsub -I
-l qos=long. Of particular note is the regardless of the -l qos it sets the
QOS to normal. Please help I seem to be missing something rather obvious.
Please note that I did try with QDEF=normal QLIST=normal,low,high,long,debug
for both USERCFG[DEFAULT] and CLASSCFG[DEFAULT] which, if I understand
correctly, should have provided my users with the option to specify these QoS
levels via #PBS -l qos=<level>
######## MAUI.CFG #############
SERVERHOST queen
# primary admin must be first in list
ADMIN1 root
ADMIN3 ALL
# Resource Manager Definition
RMCFG[BASE] TYPE=PBS
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
JOBAGGREGATIONTIME 00:00:10
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
SERVWEIGHT 1
QUEUETIMEWEIGHT 10
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
FSPOLICY DEDICATEDPES
FSDEPTH 7
FSINTERVAL 24:00:00
FSDECAY 0.80
FSWEIGHT 5
FSUSERWEIGHT 10
FSGROUPWEIGHT 1
FSCLASSWEIGHT 1
FSACCOUNTWEIGHT 1
USERWEIGHT 10
GROUPWEIGHT 5
QOSWEIGHT 1
#These lines provide a way to avoid "job starvation": when a job is queued, it
grows its priority.
XFACTORWEIGHT 3
XFWEIGHT 7
XFCAP 1000000
# Purge job information. Keep for 28 days
JOBPURGETIME 28:00:00:00
# Throttling Policies:
http://supercluster.org/mauidocs/6.2throttlingpolicies.html
#jobs exceeding limits don't increase their priority
MAXJOBQUEUEDPERUSERCOUNT 30
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY MINRESOURCE
#Increase priority for queued jobs
JOBPRIOACCRUALPOLICY
MAXJOBQUEUEDPERUSERPOLICY ON
# Allow users to specify multiple requirements for jobs
# resource specifications such as '-l nodes=3:fast+1:io'
ENABLEMULTIREQJOBS TRUE
# Job Preepmtion
# specifies how preemptible jobs will be preempted
# available options are REQUEUE, SUSPEND, CHECKPOINT
PREEMPTPOLICY SUSPEND
# How should maui handle jobs that utilize more resoureces
# than they requested.
RESOURCELIMITPOLICY MEM:EXTENDEDVIOLATION:CANCEL
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
USERCFG[DEFAULT] FSTARGET=5
GROUPCFG[cs_visitors] MAXJOB=2
#GROUPCFG[ensc_ugrad] MAXJOB=8,16
# SRCFG[administrative] PERIOD=INFINITY
# SRCFG[administrative] STARTTIME=0:00:00 ENDTIME=24:00:00
# SRCFG[administrative] USERLIST=jpeltier
# HOSTLIST=a02-nll,a03-nll,a04-nll,a05-nll,a06-nll,a07-nll,a08-nll
QOSCFG[high] PRIORITY=5000
QOSCFG[normal] PRIORITY=0
QOSCFG[low] PRIORITY=-5000
QOSCFG[long] MAXJOB=4
QOSCFG[debug] WALLTIME=01:00:00
# ensure that some nodes are still able to run
# within a 24 hour period
SHORTPOOLPOLICY ON
SHORTPOOLMAXTIME 86400
SHORTPOOLMINSIZE 128
#### JOB SUBMISSION ####
qsub -l qos=long -I
qsub: waiting for job 75564.queen to start
qsub: job 75564.queen ready
#### CHECKJOB 75564 ####
checking job 75564
State: Running
Creds: user:jpeltier group:staff class:batch qos:normal
WallTime: 00:00:00 of 1:00:00
SubmitTime: Tue Apr 14 11:05:32
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
StartTime: Tue Apr 14 11:05:33
Total Tasks: 1
Req[0] TaskCount: 1 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Dedicated Resources Per Task: PROCS: 1 MEM: 1024M
NodeCount: 1
Allocated Nodes:
[sdats1:1]
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
Reservation '75564' (00:00:00 -> 1:00:00 Duration: 1:00:00)
PE: 1.00 StartPriority: 251
#### qstat -f 75564 ####
Job Id: 75564.queen
Job_Name = STDIN
Job_Owner = jpeltier at queen
resources_used.cput = 00:00:00
resources_used.mem = 9532kb
resources_used.vmem = 114208kb
resources_used.walltime = 00:04:23
job_state = R
queue = batch
server = queen
Checkpoint = u
ctime = Tue Apr 14 11:05:32 2009
Error_Path = /dev/pts/1
exec_host = sdats1/0
Hold_Types = n
interactive = True
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Tue Apr 14 11:05:33 2009
Output_Path = /dev/pts/1
Priority = 0
qtime = Tue Apr 14 11:05:32 2009
Rerunable = False
Resource_List.mem = 1gb
Resource_List.ncpus = 1
Resource_List.neednodes = 1
Resource_List.nodect = 1
Resource_List.nodes = 1
Resource_List.qos = long
Resource_List.walltime = 01:00:00
session_id = 30924
substate = 42
Variable_List = PBS_O_HOME=/home/fas3/jpeltier,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=jpeltier,
PBS_O_PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:
/usr/bin:/usr/X11R6/bin,PBS_O_MAIL=/var/spool/mail/jpeltier,
PBS_O_SHELL=/bin/tcsh,PBS_SERVER=queen,PBS_O_HOST=queen,
PBS_O_WORKDIR=/home/fas3/jpeltier/testing,PBS_O_QUEUE=batch
euser = jpeltier
egroup = staff
hashname = 75564.queen
queue_rank = 81809
queue_type = E
etime = Tue Apr 14 11:05:32 2009
submit_args = -l qos=long -I
start_time = Tue Apr 14 11:05:33 2009
start_count = 1
--
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
Simon Fraser University - Burnaby Campus
Phone : 778-782-6573
Fax : 778-782-3045
E-Mail : jpeltier at sfu.ca
Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca
http://blogs.sfu.ca/people/jpeltier
MSN : subatomic_spam at hotmail.com
The point of the HPC scheduler is to
keep everyone equally unhappy.
More information about the mauiusers
mailing list