[Mauiusers] Torque/Maui one queue multiple QoS levels

James A. Peltier jpeltier at cs.sfu.ca
Tue Apr 14 12:29:40 MDT 2009


Hi All,

I'm trying to implement a single queue in torque with multiple QoS levels in 
maui so that I can manage priorities and such in one place, Maui.  The idea is 
that jobs submitted specify the -l qos=<level> for short,long,high,low,normal 
jobs.  Below is my maui.cfg file and details for a job submitted with qsub -I 
-l qos=long.  Of particular note is the regardless of the -l qos it sets the 
QOS to normal.  Please help I seem to be missing something rather obvious. 
Please note that I did try with QDEF=normal QLIST=normal,low,high,long,debug 
for both USERCFG[DEFAULT] and CLASSCFG[DEFAULT] which, if I understand 
correctly, should have provided my users with the option to specify these QoS 
levels via #PBS -l qos=<level>



######## MAUI.CFG #############
SERVERHOST            queen

# primary admin must be first in list
ADMIN1                root
ADMIN3                ALL

# Resource Manager Definition

RMCFG[BASE] TYPE=PBS
AMCFG[bank]  TYPE=NONE

#  full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
#  use the 'schedctl -l' command to display current configuration
JOBAGGREGATIONTIME    00:00:10
RMPOLLINTERVAL        00:00:30
SERVERPORT            42559
SERVERMODE            NORMAL

# Admin: http://supercluster.org/mauidocs/a.esecurity.html

LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html

SERVWEIGHT            1
QUEUETIMEWEIGHT       10

# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html

FSPOLICY              DEDICATEDPES
FSDEPTH               7
FSINTERVAL            24:00:00
FSDECAY               0.80
FSWEIGHT              5
FSUSERWEIGHT          10
FSGROUPWEIGHT         1
FSCLASSWEIGHT         1
FSACCOUNTWEIGHT       1
USERWEIGHT            10
GROUPWEIGHT           5
QOSWEIGHT             1

#These lines provide a way to avoid "job starvation": when a job is queued, it 
grows its priority.

XFACTORWEIGHT           3
XFWEIGHT                7
XFCAP                   1000000

# Purge job information.  Keep for 28 days
JOBPURGETIME 28:00:00:00

# Throttling Policies: 
http://supercluster.org/mauidocs/6.2throttlingpolicies.html

#jobs exceeding limits don't increase their priority
MAXJOBQUEUEDPERUSERCOUNT 30

# Backfill: http://supercluster.org/mauidocs/8.2backfill.html

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html

NODEALLOCATIONPOLICY  MINRESOURCE

#Increase priority for queued jobs
JOBPRIOACCRUALPOLICY
MAXJOBQUEUEDPERUSERPOLICY ON

#  Allow users to specify multiple requirements for jobs
#  resource specifications such as '-l nodes=3:fast+1:io'
ENABLEMULTIREQJOBS   TRUE

#  Job Preepmtion
#  specifies how preemptible jobs will be preempted
#  available options are REQUEUE, SUSPEND, CHECKPOINT
PREEMPTPOLICY SUSPEND

#  How should maui handle jobs that utilize more resoureces
#  than they requested.
RESOURCELIMITPOLICY MEM:EXTENDEDVIOLATION:CANCEL

# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html

USERCFG[DEFAULT]      FSTARGET=5
GROUPCFG[cs_visitors] MAXJOB=2
#GROUPCFG[ensc_ugrad]  MAXJOB=8,16

# SRCFG[administrative] PERIOD=INFINITY
# SRCFG[administrative] STARTTIME=0:00:00 ENDTIME=24:00:00
# SRCFG[administrative] USERLIST=jpeltier 
# HOSTLIST=a02-nll,a03-nll,a04-nll,a05-nll,a06-nll,a07-nll,a08-nll

QOSCFG[high]           PRIORITY=5000
QOSCFG[normal]         PRIORITY=0
QOSCFG[low]            PRIORITY=-5000
QOSCFG[long]           MAXJOB=4
QOSCFG[debug]          WALLTIME=01:00:00

#  ensure that some nodes are still able to run
#  within a 24 hour period
SHORTPOOLPOLICY ON
SHORTPOOLMAXTIME 86400
SHORTPOOLMINSIZE 128


#### JOB SUBMISSION ####
qsub -l qos=long -I
qsub:  waiting for job 75564.queen to start
qsub:  job 75564.queen ready


#### CHECKJOB 75564 ####

checking job 75564

State: Running
Creds:  user:jpeltier  group:staff  class:batch  qos:normal
WallTime: 00:00:00 of 1:00:00
SubmitTime: Tue Apr 14 11:05:32
    (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Tue Apr 14 11:05:33
Total Tasks: 1

Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Dedicated Resources Per Task: PROCS: 1  MEM: 1024M
NodeCount: 1
Allocated Nodes:
[sdats1:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Reservation '75564' (00:00:00 -> 1:00:00  Duration: 1:00:00)
PE:  1.00  StartPriority:  251


#### qstat -f 75564 ####

Job Id: 75564.queen
      Job_Name = STDIN
      Job_Owner = jpeltier at queen
      resources_used.cput = 00:00:00
      resources_used.mem = 9532kb
      resources_used.vmem = 114208kb
      resources_used.walltime = 00:04:23
      job_state = R
      queue = batch
      server = queen
      Checkpoint = u
      ctime = Tue Apr 14 11:05:32 2009
      Error_Path = /dev/pts/1
      exec_host = sdats1/0
      Hold_Types = n
      interactive = True
      Join_Path = n
      Keep_Files = n
      Mail_Points = a
      mtime = Tue Apr 14 11:05:33 2009
      Output_Path = /dev/pts/1
      Priority = 0
      qtime = Tue Apr 14 11:05:32 2009
      Rerunable = False
      Resource_List.mem = 1gb
      Resource_List.ncpus = 1
      Resource_List.neednodes = 1
      Resource_List.nodect = 1
      Resource_List.nodes = 1
      Resource_List.qos = long
      Resource_List.walltime = 01:00:00
      session_id = 30924
      substate = 42
      Variable_List = PBS_O_HOME=/home/fas3/jpeltier,PBS_O_LANG=en_US.UTF-8,
          PBS_O_LOGNAME=jpeltier,

PBS_O_PATH=/usr/lib/qt-3.3/bin:/usr/kerberos/bin:/usr/local/bin:/bin:
          /usr/bin:/usr/X11R6/bin,PBS_O_MAIL=/var/spool/mail/jpeltier,
          PBS_O_SHELL=/bin/tcsh,PBS_SERVER=queen,PBS_O_HOST=queen,
          PBS_O_WORKDIR=/home/fas3/jpeltier/testing,PBS_O_QUEUE=batch
      euser = jpeltier
      egroup = staff
      hashname = 75564.queen
      queue_rank = 81809
      queue_type = E
      etime = Tue Apr 14 11:05:32 2009
      submit_args = -l qos=long -I
      start_time = Tue Apr 14 11:05:33 2009
      start_count = 1

-- 
James A. Peltier
Systems Analyst (FASNet), VIVARIUM Technical Director
Simon Fraser University - Burnaby Campus
Phone   : 778-782-6573
Fax     : 778-782-3045
E-Mail  : jpeltier at sfu.ca
Website : http://www.fas.sfu.ca | http://vivarium.cs.sfu.ca
           http://blogs.sfu.ca/people/jpeltier
MSN     : subatomic_spam at hotmail.com

The point of the HPC scheduler is to
keep everyone equally unhappy.


More information about the mauiusers mailing list