[Mauiusers] priority queue / suspend job
=?ISO-8859-8-I?B?4if46Q==?=
jerry.mersel at weizmann.ac.il
Sun Apr 13 12:49:37 MDT 2008
Hi:
I am have set up 2 queues. A "normal" queue and a high
priority queue.
Of the 3 machines I am experimenting on all 3 can receive
jobs from the normal queue and 2 can receive jobs from the
high priority queue. If there are not enough free cpus a
job from the normal queue should be suspended. Everything
works fine and dandy when I'm working with 1 node, but when I
get into multiple nodes:ppn things don't work so well.
For example (workq is normal q, prio.q is high priority)
The high priority nodes have the property Jerry.
pbsnodes give:
node1
state = free
np = 2
properties = Jerry
ntype = cluster
status = opsys=linux,uname=Linux node1 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64,sessions=6714 6734,nsessions=2,nusers=1,idletime=286409,totmem=5767200kb,availmem=5638060kb,physmem=3735592kb,ncpus=2,loadave=0.00 ,netload=1216146266,state=free,jobs=? 0,rectime=1208111732
node3
state = free
np = 4
ntype = cluster
jobs = 2/144.node4
status = opsys=linux,uname=Linux node3 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64,sessions=3756 5071 27814 27834 27854 27874 28339,nsessions=7,nusers=3,idletime=289593,totmem=5825352kb,availmem=5645936kb,physmem= 12182352kb,ncpus=4,loadave=5.00,netload=694744571,state=free,jobs=144.node4,rectime=1208111732
node4
state = free
np = 2
properties = Jerry
ntype = cluster
jobs = 0/169.node4
status = opsys=linux,uname=Linux node4 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64,sessions=498 2 269 3785 4900 29262 29285,nsessions=6,nusers=3,idletime=36361,totmem=5767048kb,availmem=4722596kb,physmem=3735440kb, ncpus=2,loadave=1.00,netload=2458734191,state=free,jobs=169.node4,rectime=1208111729
When I give this command:
qsub -q prio.q -l nodes=2:ppn=2 ./t.sh
I expect the 1 job on node4 to get suspended so the high priority job can run on node1, and node4 using 2 cpus on eaach
machine but instead the new job just sits on the queue.
Here is my maui configuration file:
#
# MAUI configuration example
# @(#)maui.cfg David Groep 20031015.1
# for MAUI version 3.2.5
#
SERVERHOST node4
ADMIN1 root
ADMINHOST node4
#JOBNODEMATCHPOLICY EXACTNODE
PREEMPTPOLICY SUSPEND
#RESERVATIONPOLICY NEVER
ENABLEMULTINODEJOBS TRUE
#
RMTYPE[0] PBS
RMHOST[0] node4
RMSERVER[0] node4
SERVERPORT 40559
SERVERMODE NORMAL
# Set PBS server polling interval. Since we have many short jobs
# and want fast turn-around, set this to 10 seconds (default: 2 minutes)
RMPOLLINTERVAL 00:00:10
# a max. 10 MByte log file in a logical location
LOGFILE /var/log/maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
#NODECFG[node4] PARTITION=Jerry
#NODECFG[node1] PARTITION=Jerry
CLASSCFG[DEFAULT] QDEF=low
CLASSCFG[prio.q] QDEF=high
QOSCFG[high] PRIORITY=50000 QFLAGS=PREEMPTOR
QOSCFG[DEFAULT] QFLAGS=PREEMPTEE
QOSWEIGHT 1
I appreciate any advice anyone can give.
Thanks,
Jerry
More information about the mauiusers
mailing list