[torquedev] processor affinity
Toni L. Harbaugh-Blackford [Contr]
harbaugh at ncifcrf.gov
Sun Jun 3 09:45:56 MDT 2007
On Sun, 3 Jun 2007, Anton Menshutin wrote:
> Thanks Toni, now it is clear.
>
> I was always (we are using Torque only for two month in production) using
> ppn syntax rather than ncpus because in my opinion behavior of ncpus is
> undefined. There are rather difficult rules how ncpus is being converted
> into nodes\number of processors and I use myself and advice my users not to
> use ncpus. And with ppn syntax everth just works as expected.
>
> I can forbid using ncpus with prologue script filtering this out or refusing
> such a jobs. There is also no time shared nodes in our cluster.
>
> No, the next question is - where should I put a call to
> sched_setaffinity()? Could you tell me the function name that is most
> suitable?
>
probably TMomFinalizeChild(), in start_exec.c, before the setuid() call where
the job takes on the new user's id.
Look for the text:
/*
* become the user, execv the shell and become the real job
*/
Toni
>
>
>
> > -----Original Message-----
> > From: Toni L. Harbaugh-Blackford [Contr] [mailto:harbaugh at ncifcrf.gov]
> > Sent: Sunday, June 03, 2007 6:11 PM
> > To: Menshutin Anton
> > Cc: torquedev at supercluster.org
> > Subject: RE: [torquedev] processor affinity
> >
> > On Sun, 3 Jun 2007, Menshutin Anton wrote:
> >
> > >
> > > I still can't understand why you are saying that torque does not
> > assign cpus
> > > on a node for a job. It does.
> >
> > It does not assign *SPECIFIC* cpus in time shared mode *OR* when jobs are
> > submitted
> > using "ncpus=X" instead of "ppn=X".
> >
> > In either of these cases, on a 64 processor machine, if a 4 cpu job comes
> > in Torque
> > DOES NOT assign that job to cpus 7-10, for instance. Torque only
> > "allocates" four
> > cpus to the job for accounting purposes, to keep from oversubscribing the
> > whole system.
> >
> > > First off all, I don't using node in shared
> > > mode. One process - one cpu. If I have SMP with 4 cpu (which is my
> > case) and
> > > I have set np=4 than this is equivalent to 4 single CPU nodes. Doing
> > so,
> > > torque will assign jobs to this virtual nodes, which names are
> > 'node5/3'
> > > 'node4/1' and so on.
> > >
> > > Of course, if I want to use shared mode - more tasks than number of
> > CPUs - I
> > > do need CPU sets (as far as I understand what this feature does).
> > > But is case when several different jobs does not share cpus
> > > sched_setaffinity() is enough. This is also mentioned here -
> > > http://www.bullopensource.org/cpuset/.
> > >
> >
> > In timeshared mode or for jobs with "ncpus=X", you will need to decide
> > which cpus to do
> > "sched_setaffinity()" on.
> >
> > > It seems that I have to parse the job attribute exec_host to find out
> > cpus
> > > numbers assigned to the job.
> > >
> >
> > You will need to do more than that. Even if a node is designated as type
> > "cluster",
> > a user can submit a job using "ncpus=X" instead of "nodes=1:ppn=X", and
> > the exec_host
> > will not appear with the individual cpus broken out. For example:
> >
> > $ qstat -n -1
> > Req'd
> > Req'd Elap
> > Job ID Username Queue Jobname SessID NDS TSK
> > Memory Time S Time
> > -------------------- -------- -------- ---------- ------ ----- --- -----
> > - ----- - -----
> > 25446.mandark.ncifcr harbaugh small STDIN 11422 1 4 --
> > 24:00 R -- dexter/0
> >
> > 25446 is a four cpu job started with "ncpus=4".
> >
> > If you know that user's at your site will never use "ncpus=X", then it
> > doesn't matter
> > for you, but for sites in general you cannot assume this.
> >
> > Toni
> >
> >
> > >
> > > -----Original Message-----
> > > From: Toni L. Harbaugh-Blackford [Contr] [mailto:harbaugh at ncifcrf.gov]
> > > Sent: Sunday, June 03, 2007 5:21 PM
> > > To: Menshutin Anton
> > > Cc: 'Sergio Gelato'; torquedev at supercluster.org
> > > Subject: RE: [torquedev] processor affinity
> > >
> > >
> > >
> > >
> > > On Sun, 3 Jun 2007, Menshutin Anton wrote:
> > >
> > > > Well, I found this code about cpusets in latest snapshot of torque
> > (I
> > > was
> > > > using 2.6.1). But is seems unfinished.
> > >
> > > The code is not unfinished, it is just system specific. It is for
> > systems
> > > that have the libcpuset library.
> > >
> > > > May be cpusets are more powerful that approach with
> > shed_setaffinity.
> > > > Here is some info from web page about cpuset - Many applications
> > (as it
> > > is
> > > > often the case for HPC apps) use to have a "one process on one
> > > processor"
> > > > policy. They can use sched_setaffinity() to do so...
> > > >
> > > > So sched_setaffinity() is my choice. Statement that
> > > > Torque does not assign specific cpus seems to be not absolutely
> > correct.
> > >
> > > Torque does not assign cpus for systems that do not support libcpuset.
> > > Look at the functions in resmom/linux/cpuset.c. If your system has
> > them
> > > then it is possible you could run a cpuset-aware mom.
> > >
> > > > If a have SMP nodes and I set np=4 for example for each node, than
> > > torque
> > > > treats it as virtual node with single processor.
> > > > Here is an example line from 'qstat -f' output from my system.
> > > > exec_host = node5/3
> > > > I could treat this info as node5 cpu3. After that I could set
> > > cpuaffinity.
> > > >
> > >
> > > You best think about this in terms of big SMPs. To implement cpu
> > affinity,
> > > you need to keep track of which cpu's you've already assigned to which
> > jobs.
> > > You don't want to assign two jobs to be running on the same cpus'
> > > accidentally.
> > >
> > > If you have a 128p system and a mix of 8, 4, and 1 cpu jobs come in,
> > how
> > > do you manage where they run? How do you track which cpus are freed
> > when
> > > the jobs exit, so you can reassign those cpus to another job?
> > >
> > > Toni
> > >
> > > >
> > > > -----Original Message-----
> > > > From: Sergio Gelato [mailto:Sergio.Gelato at astro.su.se]
> > > > Sent: Sunday, June 03, 2007 12:49 AM
> > > > To: Menshutin Anton
> > > > Cc: torquedev at supercluster.org
> > > > Subject: Re: [torquedev] processor affinity
> > > >
> > > > * Menshutin Anton [2007-06-02 17:02:12 +0400]:
> > > > > I found that there is no processor affinity in torque. Jobs
> > assigned
> > > to
> > > > run
> > > > > on some cpu's selected by scheduler, could also run on other
> > cpus on
> > > this
> > > > > node.
> > > >
> > > > Really? I see in the trunk's src/resmom/start_exec.c a few
> > > > #elif defined(PENABLE_LINUX26_CPUSETS)
> > > > which make me think that something related to what you are looking
> > for
> > > > is already implemented.
> > > >
> > > > > This property is inherited by child process. It is obvious that
> > > setting it
> > > > > after fork() and before exec() will be enough. The only thing I
> > don't
> > > know
> > > > -
> > > > > where can I get info about cpus assigned to me by scheduler.
> > > >
> > > > The code already uses
> > > > pattr = &pjob->ji_wattr[(int)JOB_ATR_resource];
> > > > prd = find_resc_def(svr_resc_def,"ncpus",svr_resc_size);
> > > > presc = find_resc_entry(pattr,prd);
> > > > to find the value of the job resource "ncpus".
> > > >
> > > > > Qstat shows this info in exec_host attribute, and I suppose I
> > can get
> > > this
> > > > > string, parse it, find out localhostname and get CPUs numbers.
> > But may
> > > be
> > > > > there is a better way for getting this info?
> > > > >
> > > > > I'm asking for help from torque-dev mailing list :) Given an
> > advice, I
> > > > could
> > > > > try to implement and test it myself or may be anybody could send
> > me a
> > > > patch?
> > > >
> > >
> > > -------------------------------------------------------------------
> > > Toni Harbaugh-Blackford harbaugh at ncifcrf.gov
> > > System Administrator
> > > Advanced Biomedical Computing Center (ABCC)
> > > National Cancer Institute
> > > Contractor - SAIC/Frederick
> > >
> >
> > -------------------------------------------------------------------
> > Toni Harbaugh-Blackford harbaugh at ncifcrf.gov
> > System Administrator
> > Advanced Biomedical Computing Center (ABCC)
> > National Cancer Institute
> > Contractor - SAIC/Frederick
>
-------------------------------------------------------------------
Toni Harbaugh-Blackford harbaugh at ncifcrf.gov
System Administrator
Advanced Biomedical Computing Center (ABCC)
National Cancer Institute
Contractor - SAIC/Frederick
More information about the torquedev
mailing list