<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix"><span id="result_box" class=""
        lang="en"><span class="hps">Hello</span><br>
        <span class="hps">people</span><br>
        <br>
        <span class="hps">How</span> <span class="hps">do I set</span>
        <span class="hps">the priorities</span> <span class="hps">of my</span>
        <span class="hps">queue</span> <span class="hps">in torque</span><span
          class="">?</span><br>
        <span class="hps">I have</span> <span class="hps alt-edited">the
          queues:</span><br>
        <span class="hps">batch</span><br>
        <span class="hps">sapda</span><br>
        <span class="hps">user1</span><br>
        <span class="hps">user2</span><br>
        <span class="hps">user3</span><br>
        <br>
        <span class="hps">Queues</span> <span class="hps">user1</span><span
          class="">,</span> <span class="hps">user2</span> <span
          class="hps">and</span> <span class="hps">user3</span> <span
          class="hps">want them to have</span> <span class="hps">the
          same priorities</span> <span class="hps">and queues</span><span
          class="">:</span><br>
        <span class="hps">sapda</span> <span class="hps">batch</span> <span
          class="hps">and</span> <span class="hps">have</span> <span
          class="hps">a higher priority than</span> <span class="hps">queues</span>
        <span class="hps">of</span> <span class="hps">users</span><span
          class="">.</span><br>
        <br>
        <span class="hps">All of which</span> <span class="hps">may</span>
        <span class="hps">compete with each other</span> <span
          class="hps">as</span> <span class="hps">the priority</span><span
          class="">, or</span><br>
        <span class="hps">users</span> <span class="hps">of</span> <span
          class="hps">queues</span> <span class="hps">user1</span><span
          class="">,</span> <span class="hps">user2</span> <span
          class="hps">and</span> <span class="hps">user3</span> <span
          class="hps">when they send</span> <span class="hps">jobs</span>
        <span class="hps">they</span> <span class="hps">also</span> <span
          class="hps">compete</span> <span class="hps">with each other</span>
        <span class="hps">...</span> <span class="hps">that runs</span>
        <span class="hps">a job</span> <span class="hps">each</span> <span
          class="hps">user</span><span class="">.</span><br>
        <br>
        <br>
        <span class="hps">Likewise</span> <span class="hps">in</span> <span
          class="hps">batch</span> <span class="hps">queues</span> <span
          class="hps">and</span> <span class="hps">sapda</span><span
          class="">.</span><br>
      </span>
      <div class="moz-signature">
        <font color="#00008B">
          <b>====================</b><br>
          Atenciosamente,</font>
        <p><font color="#00008B">
            <b>Juno Costa Kim</b><br>
            <b>Departamento de Redes</b><br>
            <b>AGM Telecom</b><br>
            <b>====================</b><br>
          </font>
          <font color="#0000CD">
            IP Phone: +55 (48) 3221-0100<br>
            Fax : +55 (48) 3222-7747<br>
            Email : <a class="moz-txt-link-abbreviated" href="mailto:redes03@agm.com.br">redes03@agm.com.br</a><br>
            Website: <a class="moz-txt-link-abbreviated" href="http://www.agm.com.br">www.agm.com.br</a><br>
            Rua Joe Colla&ccedil;o, 163<br>
            88037-010 - Santa M&ocirc;nica - Florian&oacute;polis - SC<br>
          </font></p>
      </div>
      Em 15-11-2013 18:11, Jagga Soorma escreveu:<br>
    </div>
    <blockquote
cite="mid:CAKyjK53+UkGu2q=1emUU+wymx4Cjcqs0Ya61nFFSXU0c3y=HgA@mail.gmail.com"
      type="cite">
      <div dir="ltr">So, this is a brand new install of torque without
        anything running on the server/client except the torque
        processes. &nbsp;I checked and I don't think the server is running
        into any process limits. &nbsp;
        <div><br>
        </div>
        <div>I setup the server &amp; sched processes on the client
          itself and now am running everything on the client host to
          rule out external components. &nbsp;I see the same problem with the
          connection to 15002 being a problem. &nbsp;I had a 1Gig copper
          connection on this server as well and migrated my network to
          &nbsp;a completely different nic and that did not help either.</div>
        <div><br>
        </div>
        <div>This is really a bizarre one that I can't seem to find the
          cause for. &nbsp;Any other things you guys think might help me
          troubleshoot this problem? &nbsp;</div>
        <div><br>
        </div>
        <div>Thanks,</div>
        <div>-J</div>
      </div>
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">On Fri, Nov 15, 2013 at 4:05 AM,
          Jonathan Barber <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:jonathan.barber@gmail.com" target="_blank">jonathan.barber@gmail.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div dir="ltr">
              <div class="im">On 15 November 2013 03:18, Jagga Soorma <span
                  dir="ltr">&lt;<a moz-do-not-send="true"
                    href="mailto:jagga13@gmail.com" target="_blank">jagga13@gmail.com</a>&gt;</span>
                wrote:<br>
              </div>
              <div class="gmail_extra">
                <div class="gmail_quote">
                  <div class="im">
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                      <div dir="ltr">
                        <div>
                          <div>
                            <div>
                              <div>
                                <div>
                                  <div>I changed the log level and here
                                    is what I see on the server:<br>
                                    <br>
                                  </div>
                                  <div>Looks like it is intermittently
                                    having issues connecting to port
                                    15002 on the client.&nbsp; This client
                                    was just fine under the 2.5.9 torque
                                    production environment that we have
                                    but seems to be intermittently
                                    having issues in the 2.5.13 test
                                    environment that is setup with gpu
                                    support.<br>
                                  </div>
                                  <div><br>
                                  </div>
                                </div>
                              </div>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                  </div>
                  <div>[snip]&nbsp;</div>
                  <div class="im">
                    <blockquote class="gmail_quote" style="margin:0px
                      0px 0px
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
                      <div dir="ltr">
                        <div>
                          <div>
                            <div><br>
                              11/14/2013
                              19:15:20;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate:
                              setting job <a moz-do-not-send="true"
                                href="http://7352.server1.xxx.com"
                                target="_blank">7352.server1.xxx.com</a>
                              state from QUEUED-QUEUED to RUNNING-PRERUN
                              (4-40)<br>
                              11/14/2013 19:15:20;0008;PBS_Server;Job;<a
                                moz-do-not-send="true"
                                href="http://7352.server1.xxx.com"
                                target="_blank">7352.server1.xxx.com</a>;forking
                              in send_job<br>
                              <b>11/14/2013
                                19:15:20;0004;PBS_Server;Svr;svr_connect;attempting
                                connect to host 72.34.135.64 port 15002<br>
                                11/14/2013
                                19:15:20;0004;PBS_Server;Svr;svr_connect;cannot
                                connect to host port 15002 - cannot
                                establish connection () - time=0 seconds</b><br>
                              <b>11/14/2013
                                19:15:22;0004;PBS_Server;Svr;svr_connect;attempting
                                connect to host 72.34.135.64 port 15002<br>
                                11/14/2013
                                19:15:22;0004;PBS_Server;Svr;svr_connect;cannot
                                connect to host port 15002 - cannot
                                establish connection () - time=0 seconds</b><br>
                              11/14/2013 19:15:22;0008;PBS_Server;Job;<a
                                moz-do-not-send="true"
                                href="http://7352.server1.xxx.com"
                                target="_blank">7352.server1.xxx.com</a>;entering
                              post_sendmom<br>
                            </div>
                          </div>
                        </div>
                      </div>
                    </blockquote>
                    <div><br>
                    </div>
                  </div>
                  <div>You might be running up against limits on the
                    number of file descriptors the pbs_server process or
                    the OS is allowed to have open. You can use tools
                    such as lsof to see how many files the pbs_server
                    has open:</div>
                  <div>$ sudo lsof -c pbs_server</div>
                  <div><br>
                  </div>
                  <div>It's also possible that you're running out of
                    ports to bind to. Running lsof/netstat and looking
                    to see if there are massive numbers of
                    connections/files open will reveal this.</div>
                  <div><br>
                  </div>
                </div>
                Although you say there is no firewall configured on the
                servers, do you know if there a firewall between the
                pbs_server and the nodes?</div>
              <div class="gmail_extra"><br>
              </div>
              <div class="gmail_extra">You can do a simple TCP connect
                to the mom to see if it's listening:</div>
              <div class="gmail_extra">$ nmap -p 15002 <a
                  moz-do-not-send="true"
                  href="http://ava01.grid.fe.up.pt" target="_blank">ava01.grid.fe.up.pt</a>
                -oG -
                <div class="gmail_extra">
                  # Nmap 6.40 scan initiated Fri Nov 15 11:52:17 2013
                  as: nmap -p 15002 -oG - <a moz-do-not-send="true"
                    href="http://ava01.grid.fe.up.pt" target="_blank">ava01.grid.fe.up.pt</a></div>
                <div class="gmail_extra">Host: 192.168.147.1 (<a
                    moz-do-not-send="true"
                    href="http://ava01.grid.fe.up.pt" target="_blank">ava01.grid.fe.up.pt</a>)<span
                    style="white-space:pre-wrap"> </span>Status: Up</div>
                <div class="gmail_extra">Host: 192.168.147.1 (<a
                    moz-do-not-send="true"
                    href="http://ava01.grid.fe.up.pt" target="_blank">ava01.grid.fe.up.pt</a>)<span
                    style="white-space:pre-wrap"> </span>Ports:
                  15002/open/tcp//unknown///</div>
                <div class="gmail_extra">
                  # Nmap done at Fri Nov 15 11:52:17 2013 -- 1 IP
                  address (1 host up) scanned in 0.04 seconds</div>
                <div>$&nbsp;<br>
                </div>
                <div class="gmail_extra"><br>
                </div>
                <div class="gmail_extra">Or continuously with hping3
                  (I'm sure there are other tools that will do this as
                  well):</div>
                <div class="gmail_extra">
                  <div class="gmail_extra">
                    $ sudo hping3 -S -p 15002 <a moz-do-not-send="true"
                      href="http://ava01.grid.fe.up.pt" target="_blank">ava01.grid.fe.up.pt</a></div>
                  <div class="gmail_extra">HPING <a
                      moz-do-not-send="true"
                      href="http://ava01.grid.fe.up.pt" target="_blank">ava01.grid.fe.up.pt</a>
                    (em1 192.168.147.1): S set, 40 headers + 0 data
                    bytes</div>
                  <div class="gmail_extra">len=46 ip=192.168.147.1
                    ttl=61 DF id=0 sport=15002 flags=SA seq=0 win=14600
                    rtt=1.5 ms</div>
                  <div class="gmail_extra">len=46 ip=192.168.147.1
                    ttl=61 DF id=0 sport=15002 flags=SA seq=1 win=14600
                    rtt=0.8 ms</div>
                  <div class="gmail_extra">len=46 ip=192.168.147.1
                    ttl=61 DF id=0 sport=15002 flags=SA seq=2 win=14600
                    rtt=0.6 ms</div>
                  <div class="gmail_extra">len=46 ip=192.168.147.1
                    ttl=61 DF id=0 sport=15002 flags=SA seq=3 win=14600
                    rtt=1.0 ms</div>
                  <div class="gmail_extra">len=46 ip=192.168.147.1
                    ttl=61 DF id=0 sport=15002 flags=SA seq=4 win=14600
                    rtt=1.2 ms</div>
                  <div><br>
                  </div>
                  <div>(SA means it's open)</div>
                  <div><br>
                  </div>
                </div>
                <div>HTH</div>
                <span class="HOEnZb"><font color="#888888">-- <br>
                    Jonathan Barber &lt;<a moz-do-not-send="true"
                      href="mailto:jonathan.barber@gmail.com"
                      target="_blank">jonathan.barber@gmail.com</a>&gt;
                  </font></span></div>
            </div>
            <br>
            _______________________________________________<br>
            torqueusers mailing list<br>
            <a moz-do-not-send="true"
              href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
            <a moz-do-not-send="true"
              href="http://www.supercluster.org/mailman/listinfo/torqueusers"
              target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
torqueusers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>