[Moabusers] resource manager health check notifications
Lloyd Brown
somewhere_or_other at byu.edu
Fri Sep 22 16:45:59 MDT 2006
Hey all,
I'm trying to figure out some things using Torque and Moab to do health
checks on the compute nodes in our cluster. The Torque health check
script (see
http://www.clusterresources.com/wiki/doku.php?id=torque:10.2_compute_node_health_check),
works pretty well, and Moab automatically does mark the node as "down",
and the pbs_mom shows the error. I clear the error with momctl, etc.,
and it comes back. What I was wondering was how I can have Moab do
notifications based on this. I'm having trouble understanding how the
notification system
(http://www.clusterresources.com/products/mwm/docs/14.4eventmgmt.shtml)
really works, and how to apply it to this situation.
Any ideas?
Thanks,
Lloyd Brown
More information about the moabusers
mailing list