[Moabusers] problems with showstats -u
Douglas Wightman
wightman at clusterresources.com
Tue Jul 3 09:19:31 MDT 2007
Newer versions of Moab can parse the events files in REPLAY mode to
generate statistics. I would definitely recommend upgrading from a 20
month old beta :)
You can directly modify the .moab.ck files but Moab must be shutdown or
the changes will be lost. Running jobs are not affected by a Moab
restart. If you use "keep_completed" on the pbs_server then you can
still catch the jobs that finished while Moab was down.
- Douglas
On Mon, 2007-07-02 at 14:59 -0400, David Backeberg wrote:
> I looked at the file, and it mostly made sense.
>
> There are fields that describe stats for various users.
>
> We had been weekly running showstats -u, and using that output data as
> a basis for cluster usage, by department, user, project, etc. We saw
> that the raw numbers fell after a downtime.
>
> So it seems the data in .moab.ck somehow didn't get written properly.
> Is there any way to make Moab parse the history of the events.DATE
> files, and recreate those numbers accurately? Has somebody written a
> separate application that would be able to parse those files? I can
> try to add the values since the downtime to the values before the
> downtime, but is it okay to directly modify the values in the .moab.ck
> file?
>
> If I were to actually modify the file directly, do I stop Moab first?
> What happens to all the running PBS jobs if I stop Moab on a
> production cluster? Would the running jobs continue to run, but any
> incoming data about jobs finishing would be lost, and new jobs would
> just sit in the queue and not be scheduled?
>
> I hope this is an interesting question, because I couldn't find things
> about this in the archives or the manual.
>
> -Dave
>
> On 7/2/07, Douglas Wightman <wightman at clusterresources.com> wrote:
> > Moab keeps this information in the .moab.ck file. The DAY and WEEK
> > files are used for profiling information.
> >
> > The .moab.ck file will have information about the various objects in
> > Moab (jobs, nodes, users, groups, accounts). You should be able to
> > search in that file for a particular user whose data seems wrong.
> >
> > On another note, anybody using the profiling statistics should
> > immediately upgrade to Moab 5.1.0p5.
> >
> > - Douglas
> >
> > On Sun, 2007-07-01 at 00:06 -0400, David Backeberg wrote:
> > > We are running moab version 4.2.2b1.
> > >
> > > We use
> > > showstats -u
> > >
> > > to dump information about user utilization of the cluster, like how
> > > many jobs they've completed and how much time their jobs are getting.
> > > This is also helpful when we help tune job scripts, as we can see when
> > > the users request much more time than what the job requires.
> > >
> > > Recently, we brought the Moab service and the machines down for
> > > scheduled maintenance. When the machines came back up, the
> > > showstats -u
> > >
> > > output was corrupt, in that some users had now used less time than
> > > before the downtime. Some newer users were no longer listed in the
> > > statistics at all. The output itself is still formatted correctly, but
> > > it's like Moab is somehow ignorant to some of the data it should be
> > > using to calculate these values.
> > >
> > > Could somebody please explain more about how Moab actually tracks,
> > > maintains, and updates the information displayed by
> > > showstats -u
> > >
> > > We keep our logs in a directory that seems to not have any data
> > > corruptions, but is there a way to prod Moab into describing whether
> > > there are any problems parsing a particular file, or perhaps if some
> > > file may be missing? I found files in /var/spool/moab/stats/ with
> > > names like
> > > DAY.date, events.date, and WEEK.date
> > >
> > > I assume Moab stores these stats values daily, and then keeps adding
> > > to the previous day's total, but maybe something more sophisticated is
> > > going on. Please suggest how to troubleshoot my stats problems.
> > >
> > > In fact, the DAY.date and WEEK.date files just say <Data></Data> and
> > > don't have any actual data in them other then the pair of tags.
> > >
> > > The events.date seem to have the actual useful information, like how
> > > long a job took to complete and who was running it.
> > >
> > > Ideas?
> > > _______________________________________________
> > > moabusers mailing list
> > > moabusers at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/moabusers
> >
> >
More information about the moabusers
mailing list