[Mauiusers] Bug in memory limit enforcement after maui restart
Martin Kleinschmidt
mk at theochem.uni-duesseldorf.de
Thu Dec 20 07:41:48 MST 2007
there seems to be a bug in the memory limit enforcement procedure. I
have been testing for a while now, why sometimes jobs die when
restarting maui.
my example job was submitted (via torque) with
-l nodes=1:4
-l mem=5000mb
it runs without problem, but when restarting maui it is killed and by
setting the loglevel to 255 I finally found:
12/20 15:31:06 INFO: job 3369 exceeds requested memory limit (3658 >
1250)
12/20 15:31:06 MSysRegEvent(JOBRESVIOLATION: job '3369' in state
'Running' has exceeded MEM resource limit (3658 > 1250) (action CANCEL
will be taken) job start time: Thu Dec 20 15:29:34
,0,0,1)
so the total memory usage is roported to be 3658 out of 5000 mb (which
agrees with what it is really using) , but this value is then compared
to 1250 which is the limit per task (5000/4=1250).
This leads to a cencellation of the job.
The maui version is maui-3.2.6p19
...martin
our maui.cfg:
SERVERHOST suzi.theochem.uni-duesseldorf.de
ADMIN1 root
RMCFG[SUZI.THEOCHEM.UNI-DUESSELDORF.DE] TYPE=PBS
AMCFG[bank] TYPE=NONE
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
LOGFILE maui.log
LOGFILEMAXSIZE 100000000
LOGLEVEL 1
QUEUETIMEWEIGHT 1
BACKFILLPOLICY BESTFIT
RESERVATIONPOLICY CURRENTHIGHEST
BACKFILLMETRIC PE
NODEALLOCATIONPOLICY MINRESOURCE
SRNAME[0] SRBIG
SRHOSTLIST[0] ^node[1-3]$
SRUSERLIST[0] cm mk susan
SRPERIOD[0] INFINITY
ENFORCERESOURCELIMITS ON
RESOURCELIMITPOLICY[0] MEM:ALWAYS:CANCEL
ENABLEMULTIREQJOBS TRUE
USERCFG[timo] MAXPE=07
USERCFG[stefan] MAXPE=07
USERCFG[mihajlo] MAXPE=07
USERCFG[lasse] MAXPE=07
USERCFG[mk] MAXPE=60 MAXPROC=20
USERCFG[cm] MAXPE=60 MAXPROC=20
USERCFG[susan] MAXPE=60 MAXPROC=20
CLASSCFG[fast] QDEF=unlimit
CLASSCFG[medium] QDEF=batch
CLASSCFG[long] QDEF=batch
CLASSCFG[verylong] QDEF=batch
QOSCFG[batch] MAXPROC=74
QOSCFG[unlimit] OMAXPE=200
QOSCFG[unlimit] OMAXPROC=200
More information about the mauiusers
mailing list