<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Helvetica;
        panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
        {font-family:Helvetica;
        panose-1:2 11 6 4 2 2 2 2 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.msochpdefault, li.msochpdefault, div.msochpdefault
        {mso-style-name:msochpdefault;
        mso-margin-top-alt:auto;
        margin-right:0in;
        mso-margin-bottom-alt:auto;
        margin-left:0in;
        font-size:12.0pt;
        font-family:"Calibri","sans-serif";}
span.emailstyle17
        {mso-style-name:emailstyle17;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;
        font-family:"Calibri","sans-serif";}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="color:#1F497D">If you are using maui as well, what does &#8216;showq&#8217; give you?<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">In particular, you should see some jobs in either the &#8216;eligible&#8217; state or &#8216;blocked&#8217; state.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Run &#8216;checkjob&#8217; on those and you may see why they aren&#8217;t starting.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">If they are in the blocked state, it could be they were unable to start for too long, maui will block them.
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Check your DEFERTIME and DEFERCOUNT settings to have it try longer before blocking jobs.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Brian Andrus<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">ITACS/Research Computing<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Naval Postgraduate School<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">Monterey, California<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D">voice: 831-656-6238<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> torqueusers-bounces@supercluster.org [mailto:torqueusers-bounces@supercluster.org]
<b>On Behalf Of </b>Jack Wilkinson<br>
<b>Sent:</b> Tuesday, September 03, 2013 1:34 PM<br>
<b>To:</b> mauiusers@supercluster.org; torqueusers@supercluster.org<br>
<b>Subject:</b> [torqueusers] cluster hangs<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class="MsoNormal">I&#8217;m not sure if this is a maui or a torque issue, so I&#8217;m being slightly rude and sending this to both lists.<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal">We&#8217;re running maui 3.3-4 and torque 2.5.7-9 on CentOS 6.3.<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal">Most of the time there&#8217;s no problem.&nbsp; qsub a set of submit files, they run and we get our output, but every now and then, they submit and get held so that qstat shows something like this&#8230;<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal">[eob_merge@srvBatchHead01 ~]$ qstat<o:p></o:p></p>
<p class="MsoNormal">Job id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Name&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; User&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Time Use S Queue<o:p></o:p></p>
<p class="MsoNormal">------------------------- ---------------- --------------- -------- - -----<o:p></o:p></p>
<p class="MsoNormal">48267.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130003_000&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48268.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130003_001&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48269.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130003_002&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48270.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130003_003&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48271.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130003_004&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48272.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130006_000&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48273.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130006_001&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48274.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130006_002&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48275.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130006_003&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<o:p></o:p></p>
<p class="MsoNormal">48276.srvbatchhead01&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; FS93130006_004&nbsp;&nbsp; eob_merge&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0 Q batch<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal">and they&#8217;ll sit there forever like this.&nbsp; We will restart all of the associated services: maui, pbs_server, pbs_mom and munge, yet, it doesn&#8217;t help.&nbsp; Finally we just reboot all of the boxes in the cluster (fortunately, it&#8217;s a small number)
 and everything comes back up and runs.<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal">I&#8217;ve proposed a weekly reboot of everything, but have been told that this can only be a stop gap measure and cannot be the final solution.<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal">Does anyone have any clues?<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal">Kind regards,<o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
<p class="MsoNormal"><b><span style="font-size:10.5pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:black">Jack Wilkinson</span></b><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:black">Services | VPay</span><span style="font-size:8.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:black">&reg;</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:#FF230E">P: 972.367-6622</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:#FF230E"><a href="mailto:jwilkinson@stoneeagle.com">jwilkinson@stoneeagle.com</a></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:black"><a href="http://www.stoneeagle.com/">www.stoneeagle.com</a></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:black"><a href="http://www.vpayusa.com/">www.vpayusa.com</a></span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:#797979">&nbsp;</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:#797979">111 W. Spring Valley Rd., #100</span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size:9.0pt;font-family:&quot;Helvetica&quot;,&quot;sans-serif&quot;;color:#797979">Richardson, TX 75081</span><o:p></o:p></p>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
</div>
<p class="MsoNormal"><span style="font-size:12.0pt;font-family:&quot;Times New Roman&quot;,&quot;serif&quot;">CONFIDENTIALITY NOTICE: This email, including any attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information.
 Any unauthorized review, use, disclosure, or distribution is prohibited. If you received this email and are not the intended recipient, please inform the sender by email reply and destroy all copies of the original message.
<o:p></o:p></span></p>
</div>
</div>
</body>
</html>