[torqueusers] downing a node via qmgr
Brett Ellis
ellis at cs.utk.edu
Wed Sep 21 10:27:18 MDT 2005
Stewart,
I have typically used
pbsnodes -o NODENAME
which may be the same as
qmgr -c 's n node-name state=offline'
to handle problematic nodes, with no issues of
them reviving...
Brett
Stewart.Samuels at sanofi-aventis.com wrote:
> I have just experienced strange behaviour with qmgr. We currently have
> a node which is rebooting itself constantly. To take the system out of
> the cluster to diagnose the problem, I have specify the following command:
>
> qmgr -c 's n node-name state=down'
>
> For a few moments, once the qmgr command is issued, subsequent "pbsnode
> -a" commands show node-name "down". But for some reason, it then shows
> the node as "free" again.
>
> Has anyone seen this behaviour? I realize we are running a little
> behind with the patch level, but we are running torque-1.2.0p1 and
> maui-3.2.6p11.
>
> When there is such a failure (this has occurred a few times in our
> cluster), is there a way (other than qmgr) of temporarily removing nodes
> in which deleting the node in the server_prive/nodes file and restarting
> pbs_server using the "-t create" argument is not necessary?
>
> *// _ __Stewart Samuels_*
> * Infrastructure Evolution and Integration*
> * Scientific and Medical Affairs *
> * Sanofi-Aventis Pharmaceutical * ***** *
> * 1041 Route 202-206 *
> * Bridgewater, NJ 08807*
>
> * (908) 231-4762*
> * email: Stewart.Samuels at Sanofi-Aventis.com*
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list