[Moabusers] problems with showstats -u
David Backeberg
david.backeberg at case.edu
Mon Jul 2 12:59:54 MDT 2007
I looked at the file, and it mostly made sense.
There are fields that describe stats for various users.
We had been weekly running showstats -u, and using that output data as
a basis for cluster usage, by department, user, project, etc. We saw
that the raw numbers fell after a downtime.
So it seems the data in .moab.ck somehow didn't get written properly.
Is there any way to make Moab parse the history of the events.DATE
files, and recreate those numbers accurately? Has somebody written a
separate application that would be able to parse those files? I can
try to add the values since the downtime to the values before the
downtime, but is it okay to directly modify the values in the .moab.ck
file?
If I were to actually modify the file directly, do I stop Moab first?
What happens to all the running PBS jobs if I stop Moab on a
production cluster? Would the running jobs continue to run, but any
incoming data about jobs finishing would be lost, and new jobs would
just sit in the queue and not be scheduled?
I hope this is an interesting question, because I couldn't find things
about this in the archives or the manual.
-Dave
On 7/2/07, Douglas Wightman <wightman at clusterresources.com> wrote:
> Moab keeps this information in the .moab.ck file. The DAY and WEEK
> files are used for profiling information.
>
> The .moab.ck file will have information about the various objects in
> Moab (jobs, nodes, users, groups, accounts). You should be able to
> search in that file for a particular user whose data seems wrong.
>
> On another note, anybody using the profiling statistics should
> immediately upgrade to Moab 5.1.0p5.
>
> - Douglas
>
> On Sun, 2007-07-01 at 00:06 -0400, David Backeberg wrote:
> > We are running moab version 4.2.2b1.
> >
> > We use
> > showstats -u
> >
> > to dump information about user utilization of the cluster, like how
> > many jobs they've completed and how much time their jobs are getting.
> > This is also helpful when we help tune job scripts, as we can see when
> > the users request much more time than what the job requires.
> >
> > Recently, we brought the Moab service and the machines down for
> > scheduled maintenance. When the machines came back up, the
> > showstats -u
> >
> > output was corrupt, in that some users had now used less time than
> > before the downtime. Some newer users were no longer listed in the
> > statistics at all. The output itself is still formatted correctly, but
> > it's like Moab is somehow ignorant to some of the data it should be
> > using to calculate these values.
> >
> > Could somebody please explain more about how Moab actually tracks,
> > maintains, and updates the information displayed by
> > showstats -u
> >
> > We keep our logs in a directory that seems to not have any data
> > corruptions, but is there a way to prod Moab into describing whether
> > there are any problems parsing a particular file, or perhaps if some
> > file may be missing? I found files in /var/spool/moab/stats/ with
> > names like
> > DAY.date, events.date, and WEEK.date
> >
> > I assume Moab stores these stats values daily, and then keeps adding
> > to the previous day's total, but maybe something more sophisticated is
> > going on. Please suggest how to troubleshoot my stats problems.
> >
> > In fact, the DAY.date and WEEK.date files just say <Data></Data> and
> > don't have any actual data in them other then the pair of tags.
> >
> > The events.date seem to have the actual useful information, like how
> > long a job took to complete and who was running it.
> >
> > Ideas?
> > _______________________________________________
> > moabusers mailing list
> > moabusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/moabusers
>
>
More information about the moabusers
mailing list