[Moabusers] large number of jobs queued
Justin Bronder
jsbronder at gmail.com
Wed Sep 27 09:14:57 MDT 2006
On 9/27/06, Thomas Raisor <thunder at et.byu.edu> wrote:
>
> Curious,
>
> is it possible by some other policy/or would there be interest in, a
> MAXJOBPerUser parameter?
I know you can specify the following on a per class basis. If you are using
a single class, or a remap class, this might be effective.
CLASSCFG[linux-spool] MAXJOB[USER]=1
-Justin.
When I set MAXJOB to 25000 the moab process was
> consuming a GB of RAM and was pegging the CPU. Really the large number
> of jobs came from just a few users, but newly submitted jobs from other
> users weren't even considered for execution even though much of the
> cluster was idling (due to fairness policies that won't allow single
> users to have more than a certain number of running jobs). If I could
> set a MAXJOBPERUSER that would let the scheduler consider up to a
> certain number of jobs per user, with a MAXJOB cap (as it is now), then
> I think my server would not be so overburdened.
>
> Thoughts?
>
> Tom
> --
>
> wightman wrote:
> > FYI, there are a few of these parameters that can be tweeked at
> > configuration time. I'm not sure why they aren't documented but open up
> > configure and search on "max".
> >
> > For jobs you can configure with "--with-maxjobs=<number>".
> >
> > - Douglas
> >
> > On Tue, 2006-09-26 at 10:53 -0400, Justin Bronder wrote:
> >
> >> On 9/25/06, Thomas G. Raisor <thunder at et.byu.edu> wrote:
> >> Hi,
> >>
> >> I have about 25,000 jobs in my torque queue right now, but
> >> moab is only
> >> seeing roughly the first 4100 (using showq). Jobs not shown
> >> with showq
> >> give the following error when I do a checkjob on them.
> >>
> >> ERROR: cannot locate job 'jobid'
> >>
> >> I can run the jobs with qrun with no problems. This is a
> >> vanilla install
> >> of moab - are there defaults parameters I need to increase? I
> >> was using
> >> an older patch release of moab 4.5, updated to the latest and
> >> get the
> >> same behavior. Could it be torque not communicating its jobs
> >> to moab?
> >> torque version is 2.1.0 - (yes, I know I am a little out of
> >> date, but I
> >> haven't had any problems until now.)
> >>
> >> Suggestions?
> >>
> >> Tom
> >>
> >> _______________________________________________
> >> moabusers mailing list
> >> moabusers at supercluster.org
> >> http://www.supercluster.org/mailman/listinfo/moabusers
> >>
> >>
> >>
> >> It should be seeing the first 4096 jobs actually. Anyways, you can
> >> adjust the
> >> default limits, which is something I had to do for partitions. Refer
> >> to the
> >> following page and the MAXJOB variable.
> >>
> >> http://www.clusterresources.com/products/mwm/docs/a.ddevelopment.shtml
> >>
> >> -Justin.
> >> _______________________________________________
> >> moabusers mailing list
> >> moabusers at supercluster.org
> >> http://www.supercluster.org/mailman/listinfo/moabusers
> >>
> >
> >
> >
>
>
>
> --
> Tom Raisor
> Director - Fulton Supercomputing Lab
> Brigham Young University
> 801 422 4267
> tom_raisor at byu.edu
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/moabusers/attachments/20060927/a0d14487/attachment.html
More information about the moabusers
mailing list