[Moabusers] moab not sleeping
wightman
wightman at clusterresources.com
Wed Aug 16 09:53:15 MDT 2006
I would start by running the command
mdiag -R -v
And checking for any errors on the Resource Manger interface. Is Moab
having problems with TORQUE?
Also, you should be able to grep out ALERT and WARNING from the moab log
files to check for anything out of the ordinary.
Let us know what you find.
Thanks,
- Douglas
On Wed, 2006-08-16 at 11:32 -0400, Brock Palen wrote:
> Is there a way to find out why moab decided to keep going? It
> happens so often and lasts for so long most the moab commands (mdiag
> showres etc) dont work. If there is a problem with a job starting i
> want to know what job and why it cant start.
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> On Aug 16, 2006, at 11:06 AM, wightman wrote:
>
> > Moab does not always take a break between iterations. For
> > instance, if
> > there is a job that is failing to start, Moab may push the next
> > iteration to start early, and try starting the job again. If TORQUE
> > tells Moab that a new job has entered the queue Moab will normally
> > schedule it immediately.
> >
> > If Moab is continually skipping its sleep cycle, and you can see Moab
> > chewing up lots of CPU, then there may be an issue.
> >
> > - Douglas
> >
> > On Wed, 2006-08-16 at 10:26 -0400, Brock Palen wrote:
> >> Ever so often we see moab go on a sprint and not sleep between
> >> iterations. Below is a snip of a log
> >>
> >> 08/16 10:13:25 INFO: total jobs selected in partition ALL: 28/28
> >> 08/16 10:13:25 INFO: iteration: 114 scheduling time: 1.046
> >> seconds
> >> 08/16 10:13:25 INFO: current util[114]: 179/339 (52.80%) PH:
> >> 40.59% active jobs: 182 of 210 (completed: 20563)
> >> 08/16 10:13:25 ALERT: node 'nyx180' sync from expected state
> >> 'Idle' to state 'Running' at Wed Aug 16 10:13:24
> >> 08/16 10:13:25 INFO: scheduling complete. sleeping 90 seconds
> >> 08/16 10:13:25 INFO: starting iteration 115 (loglevel=2)
> >> 08/16 10:13:25 INFO: PBS data updated for iteration 115
> >> 08/16 10:13:25 INFO: 346 PBS resources detected on RM nyx
> >> 08/16 10:13:25 INFO: resources detected: 346
> >> 08/16 10:13:25 INFO: 0 PBS classes/queues detected on RM nyx
> >> 08/16 10:13:25 INFO: queues detected: 0
> >>
> >> notice there is no time between sleeping for 90 seconds and starting
> >> iteration 115.
> >>
> >>
> >> Brock Palen
> >> Center for Advanced Computing
> >> brockp at umich.edu
> >> (734)936-1985
> >>
> >>
> >> _______________________________________________
> >> moabusers mailing list
> >> moabusers at supercluster.org
> >> http://www.supercluster.org/mailman/listinfo/moabusers
> >
> >
> >
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
More information about the moabusers
mailing list