[Mauiusers] MAXNODE limit
Lennart Karlsson
Lennart.Karlsson at nsc.liu.se
Thu Mar 29 09:59:20 MDT 2007
Josh,
Yes, your MAXPROC=4 configuration successfully blocks your "-l nodes=90:ppn=1"
job. I agree on that.
What I say in the Bugzilla post is that for two-processor nodes, a MAXPROC=100
does not block a "-l nodes=90:ppn=1" job, although it will allocate 90 nodes,
i.e. 180 processors.
Because of that, MAXPROC is not the correct tool and I need MAXNODE
to work.
I am a little confused that you say that "Moab" successfully does the blocking,
but I presume that you actually have used Maui.
Does your MAXPROC=4 configuration successfully block an "-l nodes=3:ppn=1"
job, when JOBNODEMATCHPOLICY is set to to EXACTNODE and NODEACCESSPOLICY is
set to SINGLEJOB? For me, on two-processor nodes, it does not and I see no way
to use MAXPROC to emulate a non-working MAXNODE.
In less technical terms, it seems like Maui does not understand how many
processors a job will allocate, until the job is running.
So please, I would like MAXNODE to work in Maui.
Best regards,
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
National Supercomputer Centre in Linkoping, Sweden
http://www.nsc.liu.se
Joshua Butikofer wrote:
> After investigating this bug (and its alternate description in Bugzilla) it appears that you need to
> use MAXPROC instead of MAXNODE when JOBNODEMATCHPOLICY is set to EXACTNODE. (The Maui documentation
> mentions this as well @
> http://www.clusterresources.com/products/maui/docs/6.2throttlingpolicies.shtml under MAXNODE.)
>
> The Bugzilla post mentions that you already tried MAXPROC and that specifying -l nodes=90:ppn=1
> still allows the job to run. In my tests, however, Moab successfully blocks the job with my
> MAXPROC=4 for the user/group 'josh':
>
> PE: 90.00 StartPriority: 11001
> cannot select job 81 for partition DEFAULT (job 81 violates active HARD MAXPROC limit of 4 for user
> josh (R: 90, U: 0)
> )
>
> I also tried setting the policy on a QoS and it too worked as expected. Could you please send me a
> scenario to show me how the MAXPROC was failing for you? If the job succeeds in running, could you
> also send me a "checkjob -v <JOB>" output?
>
> Thanks,
>
> --
> Joshua Butikofer
> Cluster Resources, Inc.
>
> josh at clusterresources.com
> Voice: (801) 717-3707
> Fax: (801) 717-3738
> --------------------------
>
>
> Lennart Karlsson wrote:
> > Josh,
> >
> > You wrote:
> >> I would recommend trying out the patch 19 snapshot and see if you
> >> experience any problems. We hope to get the official release out over
> >> the next few days, and this release would eradicate all known bugs.
> >
> >
> > My most critical Maui bug is logged in your bugzilla as number 141.
> > (There are also a bug number 83, that looks similar.)
> >
> > Please include it within "all known bugs", that you are fixing now! I would
> > really appreciate that.
> >
> > The MAXNODE configuration parameter does not work.
> >
> > It should be easy for you to repeat the problem on your systems:
> >
> > 1/ Start with a simple Maui configuration like (I skip the
> > SERVER*/ADMIN/RMCFG/RMPOLLINTERVAL/LOG* preambles):
> >
> > QUEUETIMEWEIGHT 10
> > XFACTORWEIGHT 1
> > QOSWEIGHT 1
> >
> > FSPOLICY [NONE]
> >
> > BACKFILLPOLICY BESTFIT
> > NODEALLOCATIONPOLICY LASTAVAILABLE
> > RESERVATIONPOLICY CURRENTHIGHEST
> > RESERVATIONDEPTH 10
> >
> > JOBPRIOACCRUALPOLICY FULLPOLICY
> >
> > NODEACCESSPOLICY SINGLEJOB
> > JOBNODEMATCHPOLICY EXACTNODE
> >
> > QOSCFG[DEFAULT] PRIORITY=10000 XFWEIGHT=1000 QTWEIGHT=4
> >
> > 2/ Add MAXNODE lines for a user and the group of that user, like:
> >
> > USERCFG[lka] MAXNODE=5
> > GROUPCFG[nsc] MAXNODE=5
> >
> > 3/ Submit a lot of jobs as that user and wait until her/his jobs run on
> > a total of at least five nodes.
> >
> > 4/ Run a 'showq' and look at all the jobs of that user, that should be
> > 'blocked', but actually is 'idle' (the demonstration is done on a system
> > where each node has only one processor, and here MAXNODE could be
> > substituted with a MAXPROC, but most of our systems have more than one
> > processor on each node):
> >
> > # showq
> > ACTIVE JOBS--------------------
> > JOBNAME USERNAME STATE PROC REMAINING STARTTIME
> >
> > 55818 lka Running 5 00:05:24 Thu Feb 15 13:26:04
> > 55819 lka Running 1 00:06:01 Thu Feb 15 13:26:41
> > 55820 lka Running 1 00:06:02 Thu Feb 15 13:26:42
> > 55821 lka Running 1 00:06:33 Thu Feb 15 13:27:13
> > 55822 lka Running 1 00:06:34 Thu Feb 15 13:27:14
> > 55823 lka Running 1 00:06:35 Thu Feb 15 13:27:15
> > 55824 lka Running 1 00:06:35 Thu Feb 15 13:27:15
> > 55807 andersb Running 20 11:08:46:33 Wed Feb 14 11:07:13
> >
> > 8 Active Jobs 31 of 31 Processors Active (100.00%)
> >
> > IDLE JOBS----------------------
> > JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
> >
> > 55825 lka Idle 1 1:00:00 Thu Feb 15 13:27:15
> > 55826 lka Idle 1 1:00:00 Thu Feb 15 13:27:16
> > 55827 lka Idle 1 1:00:00 Thu Feb 15 13:27:16
> > 55828 lka Idle 1 1:00:00 Thu Feb 15 13:27:17
> > 55829 lka Idle 1 1:00:00 Thu Feb 15 13:27:17
> > 55830 lka Idle 1 1:00:00 Thu Feb 15 13:27:17
> >
> > 6 Idle Jobs
> >
> > BLOCKED JOBS----------------
> > JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
> >
> > 5/ Only job number 55818 should be running, the other 'lka' jobs should
> > be 'blocked' and neither 'running' nor 'idle'.
> >
> >
> > The demo was run with Maui version 3.2.6p19-snap.1171482917.
> >
> > I would at least like the MAXNODE parameter to work for GROUP, QOS or
> > CLASS, but of course it would be nice to have it working also on USER,
> > please.
> >
> > Best regards,
> > -- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
> > National Supercomputer Centre in Linkoping, Sweden
> > http://www.nsc.liu.se
> >
> >
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
More information about the mauiusers
mailing list