Hi Gus,<br>In default, I can submit job on nodes.<br>Now I still get the same errors as below when my pbs script tried to resubmit jobs.<br>/var/spool/torque/mom_priv/jobs/<a href="http://127.master.SC">127.master.SC</a>: line 13: qsub: command not found<br>
<br>It seems &quot;qsub&quot; cannot be recognized in pbs_script. However, if I use /usr/local/bin/qsub, my script works successfully.<br><br>So how I can let pbs_script know the path of qsub?<br><br><br>Cheers,<br>Shibo Kuang<br>
<br><div class="gmail_quote">On Thu, Mar 11, 2010 at 10:12 AM, Gus Correa <span dir="ltr">&lt;<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
Hi Shibo<br>
<br>
Sorry, I forgot this important step.<br>
On your master node do this (you may need to do this as root,<br>
or using &quot;su&quot; or &quot;sudo&quot;, unless the user shibo is also a Torque<br>
administrator):<br>
<br>
qmgr -c &quot;set server allow_node_submit = True&quot;<br>
<br>
to allow jobs to be submitted from all nodes,<br>
not only from the master.<br>
<br>
To confirm that the server configuration changed,<br>
do:<br>
<br>
qmgr -c &quot;print server&quot;<br>
<br>
<br>
Also:<br>
<br>
1) From what you say, it looks like your qsub is in /usr/local/bin/qsub,<br>
not in /var/spool/torque/bin (my wrong guess).<br>
2) There are no torque.sh and torque.csh files in /etc/profile.d.<br>
You would need to *create* them.<br>
However, this may not be necessary, as your Torque qsub command is<br>
installed on /usr/local/bin, which is likely to be in your PATH already.<div class="im"><br>
<br>
I hope this helps.<br>
Gus Correa<br>
---------------------------------------------------------------------<br>
Gustavo Correa<br>
Lamont-Doherty Earth Observatory - Columbia University<br>
Palisades, NY, 10964-8000 - USA<br>
---------------------------------------------------------------------<br>
<br>
<br>
shibo kuang wrote:<br>
</div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">
Dear Gus,<br>
thanks for your reply.<br>
I am trying moving from windows to linux to do simulations, and thus not  familar with linux things.<br>
resubmission is not working both on master and node although the submission works one time for both.<br>
when I run &quot;which qsub&quot; on master and node, both get  &quot;/usr/local/bin/qsub&quot;.<br>
using export to set the parth ( export PATH=/usr/local/bin:${PATH}) is not  working. I cannot find torque.sh, thus cannot test the second method suggested. there is no the folder &quot;/var/spool/torque/bin&quot;. Insteresting, in /var/spool/torque/pbs_environment, it gives &quot;PATH=/bin:/usr/bin&quot;<br>

thanks again and your further suggestions would be greatly appreciated.<br>
Cheers,<br>
Shibo kuang<br>
  <br></div><div class="im">
On Thu, Mar 11, 2010 at 4:18 AM, Gus Correa &lt;<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a> &lt;mailto:<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>&gt;&gt; wrote:<br>

<br>
    Hi Shibo<br>
<br></div><div><div></div><div class="h5">
    Glad that your Torque/PBS is now working.<br>
<br>
    I would guess the problem you have now with job resubmission<br>
    is related to your PATH environment variable.<br>
    Somehow Linux cannot find qsub, and I suppose this happens in the<br>
    slave node.<br>
<br>
    Does it happen in the master node also?<br>
    What do you get if you login to the slave node and do &quot;which qsub&quot;,<br>
    or just &quot;qsub&quot;?<br>
<br>
    Again, this is not a Torque problem, more of a Sys Admin issue.<br>
    A possible fix may depend a bit on where you installed Torque.<br>
    Assuming it is installed in /var/spool/torque/,<br>
    add /var/spool/torque/bin to your path,<br>
    on your shell initialization script:<br>
<br>
    For csh/tcsh, in your .cshrc/.tcshrc<br>
<br>
    setenv PATH /var/spool/torque/bin:${PATH}<br>
<br>
    For sh/bash in .profile or maybe .bashrc<br>
<br>
    export PATH=/var/spool/torque/bin:${PATH}<br>
<br>
    An alternative is to add a torque.sh and a torque.csh file<br>
    to the /etc/profile.d directory *on every node* with the<br>
    contents above.<br>
    (This may depend a bit on which Linux distribution you use.<br>
    It works for Fedora, RedHat, and CentOS, may work for others too.)<br>
<br>
<br>
    I hope this helps.<br>
<br>
    Gus Correa<br>
    ---------------------------------------------------------------------<br>
    Gustavo Correa<br>
    Lamont-Doherty Earth Observatory - Columbia University<br>
    Palisades, NY, 10964-8000 - USA<br>
    ---------------------------------------------------------------------<br>
<br>
    shibo kuang wrote:<br>
<br>
        Hi All,<br>
        Now my pbs server can work with the help of Gus Correa. My<br>
        problem is due to the fact that I did mount my master folder to<br>
        nodes. Here, i got another problem for automatically restarting<br>
        a job.<br>
        Below is my script<br>
         #!/bin/bash<br>
        #PBS -N inc90<br>
        #PBS -q short<br>
        #PBS -l walltime=00:08:00<br>
        cd $PBS_O_WORKDIR<br>
        ./nspff &gt;out<br>
        if [ -f jobfinished ]; then<br>
           rm -f jobfinished<br>
           exit 0<br>
        fi<br>
        sleep 10<br>
        qsub case<br>
         my code stops at 7min, it is supposed to get started<br>
        automatically after 10s, but failed with the following error:<br>
         /var/spool/torque/mom_priv/jobs/<a href="http://120.master.SC" target="_blank">120.master.SC</a><br></div></div>
        &lt;<a href="http://120.master.sc/" target="_blank">http://120.master.sc/</a>&gt; &lt;<a href="http://120.master.SC" target="_blank">http://120.master.SC</a><br>
        &lt;<a href="http://120.master.sc/" target="_blank">http://120.master.sc/</a>&gt;&gt;: line 13: qsub: command not found<div><div></div><div class="h5"><br>
<br>
         Your help would be greatly appreciated.<br>
         Regards,<br>
        Shibo Kuang<br>
                  <br>
           On Wed, Mar 10, 2010 at 2:57 AM, Gus Correa<br>
        &lt;<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a> &lt;mailto:<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>&gt;<br>
           &lt;mailto:<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a><br>
        &lt;mailto:<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>&gt;&gt;&gt; wrote:<br>
<br>
               Hi Shibo<br>
<br>
               Somehow your &quot;slave&quot; computer<br>
               doesn&#39;t see /home/kuang/sharpbend/s1/r8,<br>
               although it can be seen by the &quot;master&quot; computer.<br>
               It may be one of several things,<br>
               it is hard to tell exactly with the information you gave,<br>
               but here are some guesses.<br>
<br>
               Do you really have a separate /home/kuang/sharpbend/s1/r8<br>
               on your &quot;slave&quot; computer, or is it only present in the<br>
        &quot;master&quot;?<br>
               You can login to the &quot;slave&quot; and check this directly<br>
               (&quot;ls home/kuang/sharpbend/s1/r8&quot;).<br>
               If the directory is not there,<br>
               this is not really a Torque or MPI problem,<br>
               but a Sys Admin problem with exporting and mounting<br>
        directories.<br>
<br>
               If that directory exists only on the master side,<br>
               you can either create an identical copy on the &quot;slave&quot; side<br>
               (painful),<br>
               or use NFS to export it from the &quot;master&quot; computer to the<br>
               &quot;slave&quot; (easier).<br>
<br>
               For the second approach, you need to export the /home or<br>
        /home/kuang<br>
               on the &quot;master&quot; computer, and automount it on the &quot;slave&quot;<br>
        computer.<br>
               The files you need to edit are /etc/exports (master side),<br>
               and /etc/auto.master plus perhaps /etc/auto.home (slave<br>
        side).<br>
<br>
               A bit different approach (not using the automounter),<br>
               is just to hard mount /home or /home/kuang<br>
               on the &quot;slave&quot; side by adding it to the /etc/fstab list.<br>
<br>
               You also need to turn on the NFS daemon on the &quot;master&quot;<br>
        node with<br>
               &quot;chkconfig&quot;, if it is not yet turned on.<br>
<br>
               Read the man pages!<br>
               At least read &quot;man exportfs&quot;, &quot;man mountd&quot;, &quot;man fstab&quot;,<br>
               and &quot;man chkconfig&quot;.<br>
<br>
               You may need to reboot the computers for this to take effect.<br>
               Then login to the &quot;slave&quot; and try again<br>
               &quot;ls home/kuang/sharpbend/s1/r8&quot;.<br>
<br>
               I hope this helps.<br>
               Gus Correa<br>
                      ---------------------------------------------------------------------<br>
               Gustavo Correa<br>
               Lamont-Doherty Earth Observatory - Columbia University<br>
               Palisades, NY, 10964-8000 - USA<br>
                      ---------------------------------------------------------------------<br>
<br>
               shibo kuang wrote:<br>
<br>
                   &quot;/home/kuang/sharpbend/s1/r8: No such file or directory.&quot;<br>
                   my node does  not have the directory, but my master<br>
        has it.<br>
                                      On Sun, Mar 7, 2010 at 1:09 AM, shibo kuang<br>
                   &lt;<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a> &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt;<br>
        &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a> &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt;&gt;<br>
                   &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a><br>
        &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt; &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a><br>
        &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt;&gt;&gt;&gt;<br>
<br>
                   wrote:<br>
<br>
                      Hi,<br>
                      I just fix the problem using password  free<br>
        between the<br>
                   computing<br>
                      node and the master.<br>
                      But now i got another problem:<br>
                      in r8.e19, it says<br>
                      /home/kuang/sharpbend/s1/r8: No such file or<br>
        directory.<br>
                      if only one computer is used, the sever can work<br>
        normally.<br>
                      Where is missed by me when I install the torque?<br>
                      Your help would be greatly appreciated.<br>
                      Cheers,<br>
                      Shibo Kuang<br>
<br>
<br>
                           On Sun, Mar 7, 2010 at 12:46 AM, shibo kuang<br>
                   &lt;<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a> &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt;<br>
        &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a> &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt;&gt;<br>
                      &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a><br>
        &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt;<br>
                   &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a><br></div></div><div><div></div><div class="h5">
        &lt;mailto:<a href="mailto:s.b.kuang@gmail.com" target="_blank">s.b.kuang@gmail.com</a>&gt;&gt;&gt;&gt; wrote:<br>
<br>
                          Hi all,<br>
                          I tried to install a pbs server for my two<br>
        centos linux<br>
                          computers (each have 8 cores), but failed..<br>
                          Here is my problem:<br>
                          if i treat one computer as master for runnig<br>
                   pbs_server, as well<br>
                          as a computing node. I can submit jobs using<br>
        script<br>
                   without any<br>
                          problem. All jobs give the exact results.                           However, when one computer is treated as a<br>
        master, and<br>
                          another is a compting node. jobs ara never<br>
        submitted<br>
                   sucessfully.<br>
                          I would appreciate your hints and suggestions<br>
                   according the<br>
                          following prompts i got.<br>
                          Regards,<br>
                          Shibo Kuang<br>
                                   Return-Path: &lt;adm@master<br>
        &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a> &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a> &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a>&gt;&gt;&gt;&gt;<br>
<br>
                          Received: from master (localhost [127.0.0.1])<br>
                                  by master (8.13.1/8.13.1) with ESMTP id<br>
                   o26DwKF9006310<br>
                                  for &lt;kuang@master &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a><br>
        &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a> &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;&gt;&gt;&gt;; Sun, 7 Mar<br>
<br>
                          2010 00:28:20 +1030<br>
                          Received: (from root@localhost<br>
        &lt;mailto:<a href="mailto:root@localhost" target="_blank">root@localhost</a> &lt;mailto:<a href="mailto:root@localhost" target="_blank">root@localhost</a>&gt;<br>
                   &lt;mailto:<a href="mailto:root@localhost" target="_blank">root@localhost</a> &lt;mailto:<a href="mailto:root@localhost" target="_blank">root@localhost</a>&gt;&gt;&gt;)<br>
<br>
                                  by master (8.13.1/8.13.1/Submit) id<br>
                   o26DwKpZ006293<br>
                                  for kuang@master &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a><br>
        &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a> &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;&gt;&gt;; Sun, 7<br>
        Mar 2010<br>
<br>
                          00:28:20 +1030<br>
                          Date: Sun, 7 Mar 2010 00:28:20 +1030<br>
                          From: adm &lt;adm@master &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a><br>
        &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a> &lt;mailto:<a href="mailto:adm@master" target="_blank">adm@master</a>&gt;&gt;&gt;&gt;<br>
<br>
                          Message-Id: &lt;201003061358.o26DwKpZ006293@master<br>
                          &lt;mailto:<a href="mailto:201003061358.o26DwKpZ006293@master" target="_blank">201003061358.o26DwKpZ006293@master</a><br>
        &lt;mailto:<a href="mailto:201003061358.o26DwKpZ006293@master" target="_blank">201003061358.o26DwKpZ006293@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:201003061358.o26DwKpZ006293@master" target="_blank">201003061358.o26DwKpZ006293@master</a><br>
        &lt;mailto:<a href="mailto:201003061358.o26DwKpZ006293@master" target="_blank">201003061358.o26DwKpZ006293@master</a>&gt;&gt;&gt;&gt;<br>
                          To: kuang@master &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a><br>
        &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a> &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;&gt;&gt;<br>
<br>
                          Subject: PBS JOB 18.master<br>
                          Precedence: bulk<br>
                          PBS Job Id: 18.master<br>
                          Job Name:   r8<br>
                          Exec host:  par1/0<br>
                          An error has occurred processing your job, see<br>
        below.<br>
                          Post job file processing error; job 18.master<br>
        on host<br>
                   par1/0<br>
                          Unable to copy file<br>
                   /var/spool/torque/spool/18.master.OU to<br>
                          kuang@master:/home/kuang/sharpbend/s1/r8/r8.o18<br>
                          &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a> &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a><br>
        &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;&gt;:/home/kuang/sharpbend/s1/r8/r8.o18&gt;<br>
<br>
                          *** error from copy<br>
                          Permission denied<br>
        (publickey,gssapi-with-mic,password).<br>
                          lost connection<br>
                          *** end error output<br>
                          Output retained on that host in:<br>
                          /var/spool/torque/undelivered/18.master.OU<br>
                          Unable to copy file<br>
                   /var/spool/torque/spool/<a href="http://18.master.ER" target="_blank">18.master.ER</a><br>
        &lt;<a href="http://18.master.er/" target="_blank">http://18.master.er/</a>&gt; &lt;<a href="http://18.master.er/" target="_blank">http://18.master.er/</a>&gt;<br>
                          &lt;<a href="http://18.master.er/" target="_blank">http://18.master.er/</a>&gt; to<br>
<br>
                          kuang@master:/home/kuang/sharpbend/s1/r8/r8.e18<br>
                          &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a> &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;<br>
                   &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a><br>
        &lt;mailto:<a href="mailto:kuang@master" target="_blank">kuang@master</a>&gt;&gt;:/home/kuang/sharpbend/s1/r8/r8.e18&gt;<br>
<br>
                          *** error from copy<br>
                          Permission denied<br>
        (publickey,gssapi-with-mic,password).<br>
                          lost connection<br>
                          *** end error output<br>
                          Output retained on that host in:<br>
                          /var/spool/torque/undelivered/<a href="http://18.master.ER" target="_blank">18.master.ER</a><br>
        &lt;<a href="http://18.master.er/" target="_blank">http://18.master.er/</a>&gt;<br>
                   &lt;<a href="http://18.master.er/" target="_blank">http://18.master.er/</a>&gt; &lt;<a href="http://18.master.er/" target="_blank">http://18.master.er/</a>&gt;<br>
<br>
<br>
<br>
<br>
                          ------------------------------------------------------------------------<br>
<br>
<br>
<br>
                   _______________________________________________<br>
                   torqueusers mailing list<br>
                   <a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
        &lt;mailto:<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a>&gt;<br></div></div>
                   &lt;mailto:<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><div class="im"><br>
        &lt;mailto:<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a>&gt;&gt;<br>
<br>
                   <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br>
<br>
<br>
<br>
<br>
<br>
</div></blockquote>
<br>
</blockquote></div><br>