[torquedev] TORQUE 2.2.0 Defaults
Martin Siegert
siegert at sfu.ca
Thu Aug 16 17:17:48 MDT 2007
Hi Dave,
On Thu, Aug 16, 2007 at 05:17:50PM -0600, Dave Jackson wrote:
> Garrick,
>
> > > 3) set resources_available.nodect to automatically allow jobs up to the
> > > number of procs in the cluster
> >
> > "setting" resources_available.nodect would be incorrect because then it would
> > never be set again. The point of resources_available.nodect is override what
> > server thinks is correct.
> >
> > Can we make this depend on node_pack?
>
> I don't fully understand your comments about 'it would never be set
> again'. My main concern is a user of a new 32 quad core cluster
> submitting a job with 'qsub -l nodes=128' anticipating PBS's overly
> flexible definition of nodes, and not being able to run his job because
> of a 'mysterious' queue constraint. I believe sites should be able to
> force a tighter node definition but by default, this type of warning
> will be confusing to a novice.
You may expect this question from me, thus I ask anyway :-)
can we get "-l procs=n" soon a get rid of this problem once and for all
in way that is comprehensible to users?
Cheers,
Martin
--
Martin Siegert
Head, Research Computing
WestGrid Site Lead
Academic Computing Services phone: 778 782-4691
Simon Fraser University fax: 778 782-4242
Burnaby, British Columbia email: siegert at sfu.ca
Canada V5A 1S6
Note: SFU has new phone numbers!
Please use the new numbers listed above from now on.
> > > 4) modify configure to not build the GUI by default (configure
> > > --disable-gui)
> >
> > configure doesn't default to "on", it looks for the required deps and only
> > builds it if it can. What is wrong with that?
>
> Not a problem if it is working. I saw a problem yesterday in which a
> CentOS 4.4 system attempted to build the GUI by default then failed due
> to a TCL library issue. I take it your preference would be to improve
> the dependency auto-detect capability? What will you need? config.log,
> config.status? other?
>
>
> > > 5) modify pbs_mom to recover jobs by default (ie, default to 'pbs_mom
> > > -r')
> >
> > That would be incorrect. At boot, jobs can't be recovered.
>
> pbs_mom should be able to detect that quite easily since the process is
> gone. If the process is there, the most correct 'default' behavior should
> be to try to recover the job. What exceptions should there be to this?
> Again, this is default behavior and can be overridden by any advanced site.
>
> Dave
>
> On Thu, 2007-08-16 at 14:40 -0700, Garrick Staples wrote:
> > On Thu, Aug 16, 2007 at 03:41:23PM -0600, Dave Jackson alleged:
> > > 3) set resources_available.nodect to automatically allow jobs up to the
> > > number of procs in the cluster
> >
> > "setting" resources_available.nodect would be incorrect because then it would
> > never be set again. The point of resources_available.nodect is override what
> > server thinks is correct.
> >
> > Can we make this depend on node_pack?
> >
> >
> > > 4) modify configure to not build the GUI by default (configure
> > > --disable-gui)
> >
> > configure doesn't default to "on", it looks for the required deps and only
> > builds it if it can. What is wrong with that?
> >
> >
> > > 5) modify pbs_mom to recover jobs by default (ie, default to 'pbs_mom
> > > -r')
> >
> > That would be incorrect. At boot, jobs can't be recovered.
> >
> >
> > > Are there issues with these defaults? Are there additional defaults
> > > which should be set?
> > >
> > > Thanks,
> > > Dave
> > >
> > > _______________________________________________
> > > torquedev mailing list
> > > torquedev at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/torquedev
> > _______________________________________________
> > torquedev mailing list
> > torquedev at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torquedev
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
More information about the torquedev
mailing list