[torquedev] cpuset support
garrick at usc.edu
Fri Jan 11 19:18:23 MST 2008
On Mon, Nov 12, 2007 at 04:17:57PM -0800, Garrick Staples alleged:
> I just bumped into Chris Samuel at his (rather barren) booth here at SC07 and I think we just designed cpuset support.
> Here's what we came up with...
The first version is checked in! Almost 2 months to the day :)
I'll do the wiki docs this weekend or Monday.
> On startup, pbs_mom will create /dev/cpuset/torque (with all cpus) if it
> doesn't already exist and move itself to it. This allows the admin to stuff
> pbs_mom inside a smaller cpuset if desired by creating it in the initscript.
> We will call this the "torqueset".
Done, except that pbs_mom doesn't move itself into the torqueset.
> When a job starts, pbs_mom will create a per-job cpuset under the torqueset
> with the correct cpus called the "jobset". It will do this after prologue,
> which allows the admin to pre-create it if desired. This happens on all nodes.
Done. but it happens before prologue, letting it be modified if desired.
> Also, per-vnode cpusets will also be created under the jobset at job start.
> pbs_mom will run the batch script inside of the jobset and all TM spawn
> requests will run in the vnodeset.
> You end up with cpusets that look like:
Done, but they are slightly less self-descriptive:
Testers can inspect the cpus, mems, and tasks files in the various cpuset
> Job exit will consist of ensuring the cpusets are empty (killing processes)
> before removing them.
Done, though it's not particularly smart or reliable about it.
> Exclusive cpusets can't be used because of suspended jobs.
> All mems will be added to all cpusets unless someone comes up with another idea.
> This seems pretty simple to implement, doesn't require any build deps, and
> makes sense to me. Any thoughts?
All of the cpuset code is in src/resmom/linux/cpuset/cpuset.c. It has a bunch
of FIXME notes. Run configure with --enable-cpuset. All code outside of
cpuset.c must be wrapped in 'PENABLE_LINUX26_CPUSETS'.
The code is ugly, but works. It needs cleaning.
Now we need the "smarts". pbs_mom needs to discover the topology, export it to
pbs_server somehow, and then pbs_server/moab gets to schedule individual cpus.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20080111/12461b18/attachment.bin
More information about the torquedev