[Mauiusers] Re: How to configure some limits

Steve Traylen steve.traylen at cern.ch
Tue Mar 18 06:55:12 MDT 2008


On Mar 18, 2008, at 1:41 PM, Rob Lines wrote:

> On Thu, Mar 13, 2008 at 4:46 AM, Steve Traylen  
> <steve.traylen at cern.ch> wrote:
>
> >
> >
> > On Mar 12, 2008, at 9:44 PM, Rob Lines wrote:
> >
> > > I apologize for anyone that sees this twice. I somehow missed that
> > > there is a separate list for maui from the torque list.
> > >
> > > Hi everyone.  We are new to Torque/Maui and we are still getting a
> > > feel for it.  We would like to put into place some limits so that
> > > the cluster is more fairly shared.
> > >
> > > For us on our old clusters we had a limit that no one person could
> > > have more than 90% of the job slots used.  This allowed us to have
> > > people submit thousands of jobs in a batch and let them go but  
> still
> > > left a number of slots for other people to run jobs.
> > >
> > > With going to Torque/Maui we are looking to do something similar
> > > though as we have more nodes it would be nice to be able to adjust
> > > that a bit so that if there was only one person running jobs at  
> that
> > > moment it would allow them to use all the slots but the moment
> > > anyone else were to submit a job it would become the next one to  
> be
> > > run even if the first person had many more jobs waiting and that  
> had
> > > been waiting longer.
> > >
> >
> > Have a look at the soft/hard limits here.
> >
> > http://www.clusterresources.com/products/maui/docs/6.2throttlingpolicies.shtml
> >
> > For a say 100 job cluster.
> >
> > USERCFG[DEFAULT]   MAXJOB=90,110
> >
> > should do something similar to what to you want.
> >
>
> We have 188 job slots so I added
>
> USERCFG[DEFAULT] MAXJOB=94,190
>

Hmm , it should work. There is a bug somewhere when using MAXPROCS

http://scotgrid.blogspot.com/2007/11/maui-maxproc-vs-maxjobs.html

which I have not got around to confirming or finding yet.

Check two things.

diagnose -u

to check the limits are there and

checkjob <StuckJobId>

to see why it won't run.

I've never used

  USERCFG[DEFAULT]   MAXJOB=90,110

only

GROUPCFG[groupA] MAXJOB=10,30
GROUPCFG[groupB] MAXJOB=34,23

which did work. .. .The DEFAULT keyword is meant to work though....

  Steve




> With the goal that we would allow one person during heavy usage to  
> only use at most half of the processors available.  The problem we  
> have run into is that it doesn't seem to be allowing those heavy  
> users to take advantage of the free slots when no one else is using  
> them.  Any suggestions on where to look as to why it isn't?
>
> It worked out well for when we had a couple people all trying to use  
> the same cluster as it pretty much shared the resource in a  
> reasonable manner.  One person had a few hundred jobs that only took  
> about 4 hours to complete and another one had about  100 but his  
> took in the neighborhood of 12 hours.  It filled up to the limit  
> with the longer jobs but the shorter jobs were able to keep rotating  
> in and out without a problem. today was the first time since then  
> that I have see only one person with jobs queued.  In the short term  
> I upped the max jobs first number to 150 and now that user has 150  
> jobs running but it leaves us with 38 empty slots.  Earlier he had  
> 94 jobs running with the remainder in a blocked state.
>
> Thanks,
>
> Thank you,
> Rob
>
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers

-- 
Steve Traylen
steve.traylen at cern.ch




-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4309 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20080318/1893c331/smime.bin


More information about the mauiusers mailing list