<div dir="ltr">Sorry, I misread your first post. How was the user&#39;s job submitted? Do you have a qstat -f for the job?</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Sep 19, 2013 at 12:57 AM, Andrus, Brian Contractor <span dir="ltr">&lt;<a href="mailto:bdandrus@nps.edu" target="_blank">bdandrus@nps.edu</a>&gt;</span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">





<div lang="EN-US" link="blue" vlink="purple">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">David,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Yes, As I mentioned in the first post:<u></u><u></u></span></p><div class="im">
<p class="MsoNormal">I have &#39;set server max_slot_limit = 512&#39;<br>
<br>
<span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> <u></u></span></p>
</div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Brian Andrus<u></u><u></u></span></p><div class="im">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">ITACS/Research Computing<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Naval Postgraduate School<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Monterey, California<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">voice: <a href="tel:831-656-6238" value="+18316566238" target="_blank">831-656-6238</a><u></u><u></u></span></p>

<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u> <u></u></span></p>
</div><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> <a href="mailto:torqueusers-bounces@supercluster.org" target="_blank">torqueusers-bounces@supercluster.org</a> [mailto:<a href="mailto:torqueusers-bounces@supercluster.org" target="_blank">torqueusers-bounces@supercluster.org</a>]
<b>On Behalf Of </b>David Beer<br>
<b>Sent:</b> Wednesday, September 18, 2013 4:05 PM</span></p><div><div class="h5"><br>
<b>To:</b> Torque Users Mailing List<br>
<b>Subject:</b> Re: [torqueusers] Slot limit unmatched<u></u><u></u></div></div><p></p><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Brian,<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">What are your qmgr settings? Do you have a slot limit set there?<u></u><u></u></p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><u></u> <u></u></p>
<div>
<p class="MsoNormal">On Wed, Sep 18, 2013 at 3:34 PM, Andrus, Brian Contractor &lt;<a href="mailto:bdandrus@nps.edu" target="_blank">bdandrus@nps.edu</a>&gt; wrote:<u></u><u></u></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">That didn’t clear it up.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">I did find is that on one of my nodes it showed the job id as 20139590[]</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">(note the missing arrayid)</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">There were only 4 jobs from the array on that node, along with some other jobs. I tagged the node
 offline, let the jobs drain (although it still showed the entire array job) and the ran pbs_mom purge.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">After that, I restarted pbs_server and it cleared up.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Of course, now I cannot run any of the jobs that were blocked because “qrun: Execution server rejected
 request MSG=connection to mom timed out 20139590[1561].hamming.hamming.cluster“</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">It seems that those jobs want to run on that particular node and nowhere else, but the node is up
 and happy. It runs other jobs just fine.</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">I do tend to have difficulties with array jobs and torque. Lots of idiosyncrasies there.</span><u></u><u></u></p>

<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Brian Andrus</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">ITACS/Research Computing</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Naval Postgraduate School</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">Monterey, California</span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">voice:
<a href="tel:831-656-6238" target="_blank">831-656-6238</a></span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d"> </span><u></u><u></u></p>
</div>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">
<a href="mailto:torqueusers-bounces@supercluster.org" target="_blank">torqueusers-bounces@supercluster.org</a> [mailto:<a href="mailto:torqueusers-bounces@supercluster.org" target="_blank">torqueusers-bounces@supercluster.org</a>]
<b>On Behalf Of </b>Ken Nielson<br>
<b>Sent:</b> Wednesday, September 18, 2013 9:42 AM<br>
<b>To:</b> Torque Users Mailing List<br>
<b>Subject:</b> Re: [torqueusers] Slot limit unmatched</span><u></u><u></u></p>
</div>
</div>
<div>
<div>
<p class="MsoNormal"> <u></u><u></u></p>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Brian,<u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt">That is a problem. I wonder if you restart pbs_server if the slot limit problem clears up. If so it sounds like we have a counting problem in TORQUE.<u></u><u></u></p>
</div>
<p class="MsoNormal">Regards<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"> <u></u><u></u></p>
<div>
<p class="MsoNormal">On Wed, Sep 18, 2013 at 9:15 AM, Andrus, Brian Contractor &lt;<a href="mailto:bdandrus@nps.edu" target="_blank">bdandrus@nps.edu</a>&gt; wrote:<u></u><u></u></p>
<p class="MsoNormal">All,<br>
<br>
I am running torque 4.2.5<br>
I have a user who submitted an array job of ~2500 jobs<br>
I have &#39;set server max_slot_limit = 512&#39;<br>
<br>
But...<br>
There are only 8 of his jobs running, the others are blocked because they sat so long.<br>
Yet if I try to qrun one of them, I get:<br>
        qrun: Invalid request MSG=Cannot run job. Array slot limit is 512 and there are already 512 jobs running<br>
<br>
Why does torque think there are 512 slots currently in use when there are only 8?<br>
<br>
<br>
Brian Andrus<br>
ITACS/Research Computing<br>
Naval Postgraduate School<br>
Monterey, California<br>
voice: <a href="tel:831-656-6238" target="_blank">831-656-6238</a><br>
<br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><u></u><u></u></p>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
<br clear="all">
<br>
-- <br>
Ken Nielson<br>
<a href="tel:%2B1%20801.717.3700" target="_blank">+1 801.717.3700</a> office <a href="tel:%2B1%20801.717.3738" target="_blank">
+1 801.717.3738</a> fax<br>
1712 S. East Bay Blvd, Suite 300  Provo, UT  84606<br>
<a href="http://www.adaptivecomputing.com" target="_blank">www.adaptivecomputing.com</a><u></u><u></u></p>
</div>
</div>
</div>
</div>
</div>
</div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><u></u><u></u></p>
</div>
<p class="MsoNormal"><br>
<br clear="all">
<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<p class="MsoNormal">-- <u></u><u></u></p>
<div>
<p class="MsoNormal">David Beer | Senior Software Engineer<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Adaptive Computing<u></u><u></u></p>
</div>
</div>
</div></div></div>
</div>

<br>_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>David Beer | Senior Software Engineer</div><div>Adaptive Computing</div>
</div>