[Moabusers] resource manager health check notifications
wightman
wightman at clusterresources.com
Mon Sep 25 15:02:46 MDT 2006
I believe a better way to set up some event notification would be
through the TRIGGER system.
http://www.clusterresources.com/products/mwm/docs/20.1triggers.shtml
Here's an example that sends an email to the primary Moab admin
everytime a node goes down, but only at most 1 email/minute:
NODECFG[DEFAULT]
TRIGGER=AType=mail,EType=fail,MultiFire=TRUE,RearmTime=1:00,Action="$(OID) went down at $TIME"
This will put a trigger on each node such that if the state changes to
down an email will be sent.
You can change AType=exec and Action="<path to script>" to execute
anything you'd like when a node fails.
- Douglas
On Fri, 2006-09-22 at 16:45 -0600, Lloyd Brown wrote:
> Hey all,
>
> I'm trying to figure out some things using Torque and Moab to do health
> checks on the compute nodes in our cluster. The Torque health check
> script (see
> http://www.clusterresources.com/wiki/doku.php?id=torque:10.2_compute_node_health_check),
> works pretty well, and Moab automatically does mark the node as "down",
> and the pbs_mom shows the error. I clear the error with momctl, etc.,
> and it comes back. What I was wondering was how I can have Moab do
> notifications based on this. I'm having trouble understanding how the
> notification system
> (http://www.clusterresources.com/products/mwm/docs/14.4eventmgmt.shtml)
> really works, and how to apply it to this situation.
>
> Any ideas?
>
> Thanks,
> Lloyd Brown
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
More information about the moabusers
mailing list