Is there any way to get pbs/torque to get a node to reboot periodically? Our compute-nodes keep running forever and we suspect that overtime accumulate zombie processes, memory leaks etc. Making each node reboot, say, on an average once every 10 days or so is not a heavy overhead for us. After all a reboot is done in less than 5 minutes. These reboots could also be used by me to do some periodic logfile cleanup etc. {We have shared nodes 8 cores/node; so cannot really wipe out my scratch etc. through an epilouge since another job might be running on the other cpus; and under normal circumstances it is not usual to have a completely free node.}<br>
<br>What's the best way to auto-schedule this? Ideally I do not want the whole cluster to reboot. In fact, I don't want to over-specify things at all. Maybe the schedular can choose nodes to reboot based on its scheduling strategy. Just so long as it rebooots each node "on an average" once every 10 days.<br>
<br>Any sugesstions on implimentation?<br><br>-- <br>Rahul<br>