[Moabusers] large number of jobs queued

Thomas Raisor thunder at et.byu.edu
Wed Sep 27 09:05:10 MDT 2006


Curious,

is it possible by some other policy/or would there be interest in, a
MAXJOBPerUser parameter? When I set MAXJOB to 25000 the moab process was
consuming a GB of RAM and was pegging the CPU. Really the large number
of jobs came from just a few users, but newly submitted jobs from other
users weren't even considered for execution even though much of the
cluster was idling (due to fairness policies that won't allow single
users to have more than a certain number of running jobs). If I could
set a MAXJOBPERUSER that would let the scheduler consider up to a
certain number of jobs per user, with a MAXJOB cap (as it is now), then
I think my server would not be so overburdened.

Thoughts?

Tom
--

wightman wrote:
> FYI, there are a few of these parameters that can be tweeked at
> configuration time.  I'm not sure why they aren't documented but open up
> configure and search on "max".
>
> For jobs you can configure with "--with-maxjobs=<number>".
>
> - Douglas
>
> On Tue, 2006-09-26 at 10:53 -0400, Justin Bronder wrote:
>   
>> On 9/25/06, Thomas G. Raisor <thunder at et.byu.edu> wrote:
>>         Hi,
>>         
>>         I have about 25,000 jobs in my torque queue right now, but
>>         moab is only
>>         seeing roughly the first 4100 (using showq).  Jobs not shown
>>         with showq
>>         give the following error when I do a checkjob on them.
>>         
>>         ERROR:  cannot locate job 'jobid'
>>         
>>         I can run the jobs with qrun with no problems. This is a
>>         vanilla install
>>         of moab - are there defaults parameters I need to increase? I
>>         was using
>>         an older patch release of moab 4.5, updated to the latest and
>>         get the
>>         same behavior. Could it be torque not communicating its jobs
>>         to moab?
>>         torque version is 2.1.0 - (yes, I know I am a little out of
>>         date, but I
>>         haven't had any problems until now.) 
>>         
>>         Suggestions?
>>         
>>         Tom
>>         
>>         _______________________________________________
>>         moabusers mailing list
>>         moabusers at supercluster.org
>>         http://www.supercluster.org/mailman/listinfo/moabusers
>>
>>
>>
>> It should be seeing the first 4096 jobs actually.  Anyways, you can
>> adjust the
>> default limits, which is something I had to do for partitions.  Refer
>> to the
>> following page and the MAXJOB variable.
>>
>> http://www.clusterresources.com/products/mwm/docs/a.ddevelopment.shtml
>>
>> -Justin. 
>> _______________________________________________
>> moabusers mailing list
>> moabusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/moabusers
>>     
>
>
>   



-- 
Tom Raisor
Director - Fulton Supercomputing Lab
Brigham Young University
801 422 4267
tom_raisor at byu.edu


More information about the moabusers mailing list