What is the cpu load on those nodes. Any node health check scripts running. What is their output.<br><br><div class="gmail_quote">On Wed, Apr 29, 2009 at 12:58 AM, Tony Schreiner <span dir="ltr"><<a href="mailto:schreian@bc.edu">schreian@bc.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im"><br>
On Apr 28, 2009, at 3:17 PM, Tony Schreiner wrote:<br>
<br>
> On a cluster of 62 nodes, with torque 2.1.10 and maui 3.2.6p19<br>
><br>
> overnight 2 nodes have stopped accepting jobs<br>
><br>
> partial pestat output<br>
><br>
> node40 free 0.00 7879 4 16069 231 0/0 0<br>
> node41 free 0.00 8067 4 16257 228 0/0 0<br>
> node42 free 0.00* 56481 8 58465 269 0/0 88<br>
> node43 excl 8.22 64561 8 66545 22975 1/1 8 156354<br>
> mikaels<br>
> node44 free 0.11* 64561 8 66545 267 0/0 64<br>
> node45 excl 8.07 64561 8 66545 21408 1/1 8 156060<br>
> NONE* 156227<br>
><br>
> there are jobs in the queue and get submitted to other nodes but not<br>
> to node42 and node44.<br>
> node40 and node41 are not eligible for the queue being run so it's ok<br>
> that they have no jobs.<br>
><br>
> Please note the last column on those 2 nodes which is the "tasks"<br>
> parameter and is non-zero<br>
><br>
> I have restarted pbs_mom on the nodes, also done momctl -C and momctl<br>
> -c all on those nodes.<br>
> There is nothing in the mom_priv directory associated with any job.<br>
><br>
<br>
<br>
</div>If I may add one more thing.<br>
An attempt to force a job to run on the node with qrun -H node42 JOBID<br>
<br>
gives the following error<br>
qrun: Resource temporarily unavailable REJHOST=node42 MSG=cannot<br>
allocate node 'node42' to job - node not currently available (nps<br>
needed/free: 1/0, joblist: <a href="http://l.bc.edu" target="_blank">l.bc.edu</a> 2.6.27.21-170.2.56.fc10.x86_64<br>
#1 ....<br>
<div><div></div><div class="h5">_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>Regards--<br>Rishi Pathak<br>Pune-Maharastra<br>