[Mauiusers] Re: How to configure some limits
Steve Traylen
steve.traylen at cern.ch
Tue Mar 18 06:55:12 MDT 2008
On Mar 18, 2008, at 1:41 PM, Rob Lines wrote:
> On Thu, Mar 13, 2008 at 4:46 AM, Steve Traylen
> <steve.traylen at cern.ch> wrote:
>
> >
> >
> > On Mar 12, 2008, at 9:44 PM, Rob Lines wrote:
> >
> > > I apologize for anyone that sees this twice. I somehow missed that
> > > there is a separate list for maui from the torque list.
> > >
> > > Hi everyone. We are new to Torque/Maui and we are still getting a
> > > feel for it. We would like to put into place some limits so that
> > > the cluster is more fairly shared.
> > >
> > > For us on our old clusters we had a limit that no one person could
> > > have more than 90% of the job slots used. This allowed us to have
> > > people submit thousands of jobs in a batch and let them go but
> still
> > > left a number of slots for other people to run jobs.
> > >
> > > With going to Torque/Maui we are looking to do something similar
> > > though as we have more nodes it would be nice to be able to adjust
> > > that a bit so that if there was only one person running jobs at
> that
> > > moment it would allow them to use all the slots but the moment
> > > anyone else were to submit a job it would become the next one to
> be
> > > run even if the first person had many more jobs waiting and that
> had
> > > been waiting longer.
> > >
> >
> > Have a look at the soft/hard limits here.
> >
> > http://www.clusterresources.com/products/maui/docs/6.2throttlingpolicies.shtml
> >
> > For a say 100 job cluster.
> >
> > USERCFG[DEFAULT] MAXJOB=90,110
> >
> > should do something similar to what to you want.
> >
>
> We have 188 job slots so I added
>
> USERCFG[DEFAULT] MAXJOB=94,190
>
Hmm , it should work. There is a bug somewhere when using MAXPROCS
http://scotgrid.blogspot.com/2007/11/maui-maxproc-vs-maxjobs.html
which I have not got around to confirming or finding yet.
Check two things.
diagnose -u
to check the limits are there and
checkjob <StuckJobId>
to see why it won't run.
I've never used
USERCFG[DEFAULT] MAXJOB=90,110
only
GROUPCFG[groupA] MAXJOB=10,30
GROUPCFG[groupB] MAXJOB=34,23
which did work. .. .The DEFAULT keyword is meant to work though....
Steve
> With the goal that we would allow one person during heavy usage to
> only use at most half of the processors available. The problem we
> have run into is that it doesn't seem to be allowing those heavy
> users to take advantage of the free slots when no one else is using
> them. Any suggestions on where to look as to why it isn't?
>
> It worked out well for when we had a couple people all trying to use
> the same cluster as it pretty much shared the resource in a
> reasonable manner. One person had a few hundred jobs that only took
> about 4 hours to complete and another one had about 100 but his
> took in the neighborhood of 12 hours. It filled up to the limit
> with the longer jobs but the shorter jobs were able to keep rotating
> in and out without a problem. today was the first time since then
> that I have see only one person with jobs queued. In the short term
> I upped the max jobs first number to 150 and now that user has 150
> jobs running but it leaves us with 38 empty slots. Earlier he had
> 94 jobs running with the remainder in a blocked state.
>
> Thanks,
>
> Thank you,
> Rob
>
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
--
Steve Traylen
steve.traylen at cern.ch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4309 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20080318/1893c331/smime.bin
More information about the mauiusers
mailing list