We need to restart the pbs_mom to implement the fix found here <a href="http://www.clusterresources.com/pipermail/torqueusers/2007-March/005360.html">http://www.clusterresources.com/pipermail/torqueusers/2007-March/005360.html</a>. We have never restarted the pbs_mom process while there were jobs running on a node (atleast ones that we cared about keeping) so I am wondering what the results would be of restarting them on machines with active jobs. We have restarted the maui process before with no problem but its' part in the process is different.<br>
<br>We had the backup plan of just draining all the nodes then restarting pbs_mom on any of them that don't have jobs currently then putting those nodes back in service then once the other nodes that have current jobs finish we would restart their pbs_mom and put them back in service. I had just hoped to avoid that because it would mean I have to pay attention to the them and some of the jobs that are running currently are multi day runs.<br>
<br>Thanks for the help,<br>Rob<br>