<br><br><div class="gmail_quote">On Thu, Jun 12, 2008 at 5:03 PM, Joshua Bernstein <<a href="mailto:jbernstein@penguincomputing.com">jbernstein@penguincomputing.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="Ih2E3d"><br>
<br>
Glen Beane wrote:<br>
<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
I think I can probably try that out tomorrow, but I would really appreciate it if you could give this a test first.<br>
</blockquote>
<br></div>
Alright, I just grabbed the SVN tree from about an hour or so ago and gave this a go. At first it seems to do the right thing. When a node reboots, and after it comes up I see:<br>
<br>
06/12/2008 13:01:23;0004;PBS_Server;Svr;WARNING;ALERT: unable to contact node n0<br>
06/12/2008 13:02:48;0100;PBS_Server;Job;<a href="http://0.goldstar.penguincomputing.com" target="_blank">0.goldstar.penguincomputing.com</a>;dequeuing from batch, state EXITING<br>
06/12/2008 13:02:48;0040;PBS_Server;Svr;<a href="http://goldstar.penguincomputing.com" target="_blank">goldstar.penguincomputing.com</a>;Scheduler sent command term<br>
<br>
The job then disappears from the server's qstat, but pbsnodes n0 still shows the job as being on that node. But the node suddenly gets marked as down and it reports:<br>
<br>
06/12/2008 13:08:58;0002; pbs_mom;Svr;im_eof;Premature end of message from addr <a href="http://10.101.10.25:15001" target="_blank">10.101.10.25:15001</a><br>
06/12/2008 13:09:14;0002; pbs_mom;Svr;im_eof;Premature end of message from addr <a href="http://10.2.1.1:15001" target="_blank">10.2.1.1:15001</a><br>
<br>
Just let me know how I can help!</blockquote></div><br>can you try the latest 2.3-fixes? I had forgotten to release the resources used by the unknown job. I just tested this out and pbsnodes no longer shows the job as being on the node. After several minutes the state of the node is still free.<br>
<br><br><br>