abhig at princeton.edu
Thu Dec 16 09:32:43 MST 2010
Its not increasing memory, but if I say I need mem=6gb or pmem=6gb, it
still goes to the node with total memory less than 6gb. So I thought by
setting the NODEAVAILABILITYPOLICY, I will be able to define
availability on the bases of memory.
Like we define np= in nodes file, do we have to define memory resources
Renato Borges wrote:
> Hi Abhi!
> On Wed, Dec 15, 2010 at 7:21 PM, Abhishek Gupta <abhig at princeton.edu
> <mailto:abhig at princeton.edu>> wrote:
> I am trying to figure out the way so that memory usage does not exceed
> the available memory on a node. I was thinking that this parameter (
> NODEAVAILABILITYPOLICY COMBINED:MEM ) should check the availability of
> node on the bases of memory available, but it does not.
> Is there anything else I need to add to make it work?
> NODEAVAILABILITYPOLICY COMBINED:MEM
> I´ve never used NODEAVAILABILITYPOLICY, but I have a similar problem,
> which is: the jobs we run at my site start out with a small memory
> footprint, and end with large amounts of data in memory (in
> virtualization lingo, they "balloon"). Maybe this is also your case,
> and this is why setting this variable doesn`t work?
> To avoid swapping, I have set a MAXJOBPERUSER variable for each
> compute node, because all of our jobs that have an increasing memory
> footprint come from a single user (actually, a grid account).
> Tweaking the MAXJOBPERUSER variable, I have found a value for each
> node (we have an heterogeneous cluster) that runs the jobs without
> However, this is not ideal because this setting is applied to all jobs
> that run on a given node, and some local users have jobs that are
> small in memory, but large in number of cores, and the limits which I
> set for the grid jobs are too restrictive for them. Whereas a grid job
> can only run 4 jobs on a 8 core, 8GB RAM node, local user´s jobs could
> merrily run on all 8 cores simultaneously.
> Trying to find a better solution, I found that one can set on torque
> (supposing you use torque):
> qmgr -c "set queue XXX resources_min.mem=2000kb"
> And this would (theoretically) only attribute nodes that have at least
> 2GB of free memory to waiting jobs on XXX queue. I say "theoretically"
> because I have not had luck with this setting. As I said, our grid
> jobs balloon, and so our nodes get one job per slot, since initially
> (for the first few hours) the jobs are only downloading data, and so
> there is always 2GB free. But when the memories ballon, we start
> swapping heavily.
> I guess that you might have more luck with that if your jobs´ memory
> footprint is more constant, or if some guru could teach us how to
> "reserve" some memory amount per job, I know that would suit me perfectly.
> Renato Callado Borges
> Lab Specialist - DFN/IF/USP
> Email: rborges at dfn.ifusp.br <mailto:rborges at dfn.ifusp.br>
> Phone: +55 11 3091 7105
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mauiusers