[Moabusers] Standing Reservations
Justin Bronder
jsbronder at gmail.com
Mon Jun 5 14:20:57 MDT 2006
I would like to utilize two standing reservations across the entire cluster
to accomplish the following:
1.) Any jobs must ask for one of these reservations.
2.) When a job is being prepared to run, we'll use triggers as in
"5.6 Resource Provisioning" to boot into either darwin or linux.
3.) By default, one reservation will begin scheduling jobs from the
top of the cluster, the other from the bottom. Ideally, this will
minimize unnecessary reimaging.
As a first round, I'm just trying to setup the two reservations, here
is the relevant configuration details.
SRCFG[gentoo-linux] HOSTLIST=r:node[1-68]$
SRCFG[gentoo-linux] PERIOD=INFINITY
SRCFG[gentoo-linux] CLASSLIST=default
SRCFG[gentoo-linux] ACCESS=SHARED
SRCFG[gentoo-linux] PARTITION=ALL
SRCFG[darwin] HOSTLIST=r:node[1-68]$
SRCFG[darwin] PERIOD=INFINITY
SRCFG[darwin] CLASSLIST=default
SRCFG[darwin] ACCESS=SHARED
SRCFG[darwin] PARTITION=ALL
This works fine when scheduling jobs into gentoo-linux. However,
jobs scheduled for darwin will not run. I did note that I seem to have
to specify the classlist. Leaving PARTITION=ALL out seems to have
no effect. (One nice thing would be not having to specify the classlist
and have it default to all classes, but it's a minor detail).
jbronder at meldrew-linux ~/src $ checkjob 149
job 149
AName: go.round_robin
State: Idle
Creds: user:jbronder group:clusteradmin class:default
WallTime: 00:00:00 of 00:05:00
SubmitTime: Mon Jun 5 16:01:12
(Time Queued Total: 00:00:03 Eligible: 00:00:00)
Total Requested Tasks: 6
Req[0] TaskCount: 6 Partition: ALL
Memory >= 0 Disk >= 0 Swap >= 0
Opsys: linux Arch: --- Features: ---
Flags: ADVRES:darwin,RESTARTABLE
Attr: checkpoint
StartPriority: 1
NOTE: job cannot run in partition base (idle procs do not meet requirements
: 0 of 6 procs found)
idle procs: 110 feasible procs: 0
Node Rejection Summary: [State: 1][Reserved: 55]
It seems that the reservations are not actually sharing the nodes. Or
I've misconfigured something. Please let me know.
A second point is getting the reservations to schedule from opposite sides
of the cluster. I've achieved this before with queues using default
nodesets.
Unless there is a better way, I'd like to make a request for the following
functionality. I've setup the features to basically reflect racks.
SRCFG[gentoo-linux] DEFAULT.NODESET=ANYOF:FEATURE:11,12,13,14
or SRCFG[gentoo-linux] FEATURES=ANYOF:11,12,13
This would allow me to set the other reservation as:
SRCFG[darwin] DEFAULT.NODESET=ANYOF:FEATURE:14,13,12,11
Which I believe would accomplish the task.
Lastly, using triggers as mentioned earlier. I could create a script to
boot
the required nodes into the required operating system. This takes one to
five minutes depending on the number of nodes and current network load.
The trigger would look like:
RSVPROFILE[gentoo-linux]
TRIGGER=etype=start,atype=exec,adata='/usr/sbin/node_type $REQOS $HOSTLIST'
Where node_type would start the process to boot the required nodes into
the correct OS if necessary. The problem arises that we may have to wait
a little while before actually talking to the pbs_mom processes and booting
the job. Is there a way to tell moab to hold off for a predetermined amount
of
time, or even better, tell moab to wait until a message is sent?
If the trigger is a fork of moab and won't halt the entire process, it would
be
trivial to have nodes send a message to node_type as they reboot and have
node_type wait on them before exiting.
I realize I just asked a number of questions and I hope they haven't been
answered in the documentation I've been reading all day. If so I apologize
in advance. When someone gets a free moment, I look forward to their
reply.
Thanks in advance,
Justin Bronder.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/moabusers/attachments/20060605/6d8ad75d/attachment.html
More information about the moabusers
mailing list