[torqueusers] Job Allocation on Nodes
Gareth.Williams at csiro.au
Gareth.Williams at csiro.au
Thu Mar 8 13:33:31 MST 2012
> -----Original Message-----
> From: Svancara, Randall [mailto:rsvancara at wsu.edu]
> Sent: Thursday, 8 March 2012 11:41 AM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Job Allocation on Nodes
>
> Hi,
>
> Basically for the reason you described, prevent users from over
> subscribing a node in term of memory. I am still working to get a
> better handling on the scheduling jobs. Perhaps I need to look at
> the -l mem flag? If I say I need five nodes, with 24GB of RAM per
> node, will -l mem=24GB give me a five nodes with 1 core and 24GB of
> RAM. At this point I have been using nodes and ppn to regulate how
> much runs on each node, but I admit, it is problematic as there is no
> guarantee that someone else will not use the same node.
Hi Randall,
I'd look at -l vmem rather than mem. vmem is whole-of-job so for
exclusive access to 24GB nodes (because all the memory would be
dedicated) you could have requests like -l nodes=12:ppn=3,vmem=288GB
and -l nodes=5:ppn=1,vmem=120GB.
Gareth
>
> Thanks,
>
> Randall Svancara
> High Performance Computing Systems Administrator
> Washington State University
> 509-335-3039
>
>
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-
> bounces at supercluster.org] On Behalf Of Gareth.Williams at csiro.au
> Sent: Wednesday, March 07, 2012 4:31 PM
> To: torqueusers at supercluster.org
> Subject: Re: [torqueusers] Job Allocation on Nodes
>
> > Perhaps this question has been answered before. I have users who
> want to distribute jobs equally amongst nodes. What I am observing at
> the moment is that when a user submits a job with nodes=12:ppn=3, the
> job uses three nodes with 12 cores per node. Is there a way to make
> the job use only three cores per node. How can I prevent this or setup
> some kind of affinity for following the user's job requirements?
>
> Hi Randall,
>
> Why would you want to do such a thing? If the user submits four of the
> jobs they will align, and you will get worse contention. I would
> suggest: if you need to spread jobs to access memory then you should
> schedule memory and/or if you need to avoid contention, say for memory
> bandwidth, then get the users to request whole nodes (all the available
> ppn) and only run as many processes as their scaling permits (they may
> need custom mpirun options).
>
> Gareth
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list