<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>127.0.0.1 is a special address that references localhost.<br>
<a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Localhost">http://en.wikipedia.org/wiki/Localhost</a><br>
<br>
<br>
<br>
127.0.0.1  is not what you want for your hostname ( pbs_moms trying to
connect to 127.0.0.1 will try to talk to themselves)<br>
<br>
You will want to setup an IP address on your pbs_server/scheduler node
that corresponds to the network that your pbs_moms are on.<br>
And then make sure that the hostname you give it matches that of the
file in $PBS_HOME/server<br>
<br>
Copying the init script to /etc/init.d is a start, you will then
probably need to turn it on by running :<br>
<br>
To set it up to start on reboot:<br>
<br>
chkconfig add pbs_sched<br>
and then<br>
chkconfig pbs_sched on<br>
<br>
To start it use /etc/init.d/pbs_sched start<br>
<br>
<br>
</tt><tt>--Jerry<br>
</tt><br>
<br>
Samir Gartner wrote:
<blockquote
 cite="mid:e73901d60905211226w3da1ae06i629c1ccb04f7922a@mail.gmail.com"
 type="cite">Ok Gus and everyone. Thanks again for your answers.<br>
  <br>
  <div class="gmail_quote">There is no pbs_sched on /etc/init.d but it
is here:<br>
  <br>
/usr/local/src/torque-2.3.6/contrib/init.d/pbs_sched<br>
/usr/local/src/torque-2.3.6/tpackages/server/opt/pbs/sbin/pbs_sched<br>
/usr/local/src/torque-2.3.6/src/scheduler.cc/.libs/pbs_sched<br>
/usr/local/src/torque-2.3.6/src/scheduler.cc/pbs_sched<br>
/opt/pbs/sbin/pbs_sched<br>
  <br>
I was thinking copying /opt/pbs/sbin/pbs_sched to /etc/init.d. Is it
right to do that?<br>
  <br>
Sorry about the "manually" word. It is local slang I guess. What I mean
is that I went to the /opt/pbs/sbin/ folder and executed ./pbs_sched<br>
  <br>
hostname output is:<br>
  <br>
rufian.perrera.local<br>
  <br>
hosts file contain:<br>
  <br>
# Do not remove the following line, or various programs<br>
# that require network functionality will fail.<br>
#127.0.0.1              localhost.localdomain localhost   
&lt;--------------------------Is this wrong?<br>
::1             localhost6.localdomain6 localhost6<br>
127.0.0.1 rufian.perrera.local rufian<br>
192.168.2.6 auyin.perrera.local auyin<br>
192.168.2.4 pelusa.perrera.local pelusa<br>
192.168.2.2 lamparita.perrera.local lamparita<br>
  <br>
  <br>
network content is:<br>
  <br>
NETWORKING=yes<br>
HOSTNAME=rufian.perrera.local<br>
DOMAINNAME=perrera.local<br>
  <br>
I dont have /etc/sysconfig/pbs_server nor /etc/sysconfig/pbs_sched
either <br>
  <br>
  <br>
2009/5/21 Gus Correa <span dir="ltr">&lt;<a moz-do-not-send="true"
 href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>&gt;</span><br>
  <blockquote class="gmail_quote"
 style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
    <div class="im">Samir Gartner wrote:<br>
&gt; Ok, scheduling wasn't enabled,now it is,<br>
    <br>
    </div>
It happens very often.<br>
Fixing it is a good first step.<br>
    <div class="im"><br>
&gt; but pbs_sched service was not<br>
&gt; found.<br>
    <br>
    </div>
Starting up daemons in YDog may be different from RHEL, CentOS, Fedora,<br>
so I am just guessing based on the latter. Not familiar to YDog.<br>
Anyway ...<br>
    <br>
Don't know if you got Torque from ClusterResources or other.<br>
In any case, there should be a pbs_sched script on /etc/init.d<br>
If it is there, do "chkconfig --add pbs_sched" (or YDog equivalent),<br>
then do "chkconfig --list pbs_sched" to see which runlevels it will be<br>
on, then "service pbs_sched start" to start it, or if YDog doesn't have<br>
"service", run it with "/etc/init.d/pbs_sched start".<br>
    <br>
If you don't have the pbs_sched script in /etc/init.d, you may find one<br>
in the contrib subdirectory of the Torque source tree.<br>
Copy it over to /etc/init.d, and do the above.<br>
(The location may be other than /etc/init.d in YDog.)<br>
    <div class="im"><br>
    <br>
&gt; I didn't install maui, it is a default installation. About hosts<br>
&gt; file, it is properly configured as well as nodes and mom's config
files.<br>
&gt;<br>
    <br>
    </div>
You only need Maui if you want a complex scheduling policy.<br>
pbs_sched is FIFO, very simple, but works fine.<br>
I've used it for a long time without problems.<br>
    <div class="im"><br>
&gt; when I manually start pbs_sched it says<br>
&gt;<br>
&gt; pbs_sched: addclient, host localhost not found<br>
&gt;<br>
    <br>
    </div>
Hmm ... never got this one, not that I remember.<br>
Not sure what you mean by "manually start pbs_sched".<br>
Anyway, sounds as another, different, problem.<br>
    <br>
    <br>
Is it possible that your "hostname" command<br>
is not resolving your server name to rufian.perrera.local but to<br>
localhost?<br>
What is the output of "hostname"?<br>
What do you have in /etc/hosts?<br>
What do you have in /etc/sysconfig/network?<br>
    <br>
Just in case you have  /etc/sysconfig/pbs_server and<br>
/etc/sysconfig/pbs_sched, what is the contents?<br>
(I don't have them.)<br>
    <br>
(Again just guessing, YDog may have different files to startup things.)<br>
    <div class="im"><br>
I hope this helps,<br>
Gus Correa<br>
---------------------------------------------------------------------<br>
Gustavo Correa<br>
Lamont-Doherty Earth Observatory - Columbia University<br>
Palisades, NY, 10964-8000 - USA<br>
---------------------------------------------------------------------<br>
    <br>
&gt;<br>
    </div>
&gt; 2009/5/21 Samir Gartner &lt;<a moz-do-not-send="true"
 href="mailto:jigzat@gmail.com">jigzat@gmail.com</a> &lt;mailto:<a
 moz-do-not-send="true" href="mailto:jigzat@gmail.com">jigzat@gmail.com</a>&gt;&gt;<br>
    <div class="im">&gt;<br>
&gt;     I think I'm gonna cry.... I love you guys!! No, seriously, it
worked<br>
&gt;     but only if executed under root user, now the question is what
did I<br>
&gt;     do wrong? Jobs should start automatically, right?<br>
&gt;<br>
&gt;     I was following first the Globus tootlikt tutorial but it is
kinda<br>
&gt;     outdated so I guess I issued some wrong instructions.<br>
&gt;<br>
&gt;     On of the weird things was that the tutorial suggested using
the<br>
&gt;     /opt/pbs prefix when executing configure and now I have under<br>
&gt;     /opt/pbs again a /opt/pbs folder with repeated bin and sbin
folders<br>
&gt;     and executables. Is this wrong or is how it is supposed to be?<br>
&gt;<br>
    </div>
&gt;     2009/5/21 Ling C. Ho &lt;<a moz-do-not-send="true"
 href="mailto:ling@fnal.gov">ling@fnal.gov</a> &lt;mailto:<a
 moz-do-not-send="true" href="mailto:ling@fnal.gov">ling@fnal.gov</a>&gt;&gt;<br>
    <div class="im">&gt;<br>
&gt;         Have you configured a scheduler?<br>
&gt;<br>
&gt;         What if you use qrun. Would any job starts?<br>
&gt;<br>
&gt;         ...<br>
&gt;         ling<br>
&gt;<br>
&gt;         Samir Gartner wrote:<br>
&gt;<br>
&gt;             Ok, I don't see any file named default_server but<br>
&gt;             server_name has the right server name
rufian.perrera.local<br>
&gt;             and there is another file with the same content named<br>
&gt;             server_name.new.<br>
&gt;<br>
&gt;             Righ now the PSB server name apears to be correct
(after<br>
&gt;             stoping the server and manually deletting the zombie
jobs)<br>
&gt;             but stil the jobs won't start.<br>
&gt;<br>
&gt;<br>
&gt;             [samir@rufian ~]$ echo "sleep 30;date" |
/opt/pbs/bin/qsub<br>
&gt;             [samir@rufian ~]$ /opt/pbs/bin/qstat -a<br>
&gt;<br>
&gt;             rufian.perrera.local:<br>
&gt;<br>
&gt;                         Req'd  Req'd   Elap<br>
&gt;             Job ID               Username Queue    Jobname<br>
&gt;              SessID NDS   TSK Memory Time  S Time<br>
&gt;             -------------------- -------- -------- ----------------<br>
&gt;             ------ ----- --- ------ ----- - -----<br>
&gt;             13.rufian.perrer     samir    batch    STDIN<br>
&gt;             --      1  --    --  01:00 Q   --<br>
&gt;             [samir@rufian ~]$<br>
&gt;<br>
&gt;<br>
&gt;             by the way, is it top posting allowed??<br>
&gt;<br>
&gt;             2009/5/21 Jerry Smith &lt;<a moz-do-not-send="true"
 href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a><br>
    </div>
&gt;             &lt;mailto:<a moz-do-not-send="true"
 href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a>&gt; &lt;mailto:<a
 moz-do-not-send="true" href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a><br>
    <div>
    <div class="h5">&gt;             &lt;mailto:<a
 moz-do-not-send="true" href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a>&gt;&gt;&gt;<br>
&gt;<br>
&gt;<br>
&gt;                Samir,<br>
&gt;<br>
&gt;                What do you have in
$PBS_HOME/{server_name,default_server}?<br>
&gt;<br>
&gt;                It should be what resolves as the ethernet address
that<br>
&gt;             pbs should<br>
&gt;                be listening on.<br>
&gt;<br>
&gt;                --Jerry<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;                Samir Gartner wrote:<br>
&gt;<br>
&gt;                    Ok I finally installed torque under
yellowdog/ppc but<br>
&gt;             now I have<br>
&gt;                    another problem. I set up my pbs server as<br>
&gt;             rufian.perrera.local<br>
&gt;                    but when I issue a job it shows itself in<br>
&gt;             localhost.localdomain<br>
&gt;                    and it stays on queued state forever. And if i
try to<br>
&gt;             qdel the<br>
&gt;                    job it cant reach the server and the conection
times<br>
&gt;             out. Any<br>
&gt;                    ideas of what could be wrong?<br>
&gt;                    I'm not trying to set up anything complicated,
is<br>
&gt;             just one<br>
&gt;                    machine that works as server and client.<br>
&gt;<br>
&gt;                    this is the shell output<br>
&gt;<br>
&gt;                    [root@rufian bin]# /opt/pbs/bin/qstat -a<br>
&gt;<br>
&gt;                    rufian.perrera.local:<br>
&gt;<br>
&gt;                                      Req'd  Req'd   Elap<br>
&gt;                    Job ID               Username Queue    Jobname<br>
&gt;                SessID<br>
&gt;                    NDS   TSK Memory Time  S Time<br>
&gt;                    -------------------- -------- --------<br>
&gt;             ---------------- ------<br>
&gt;                    ----- --- ------ ----- - -----<br>
&gt;                    7.localhost.loca     samir    batch    STDIN<br>
&gt;                   --             1  --    --  01:00 Q   --<br>
&gt;                    8.localhost.loca     samir    batch    STDIN<br>
&gt;                   --             1  --    --  01:00 Q   --<br>
&gt;                    9.localhost.loca     samir    batch    STDIN<br>
&gt;                   --             1  --    --  01:00 Q   --<br>
&gt;                    10.localhost.loc     samir    batch    STDIN<br>
&gt;                   --             1  --    --  01:00 Q   --<br>
&gt;                    [root@rufian bin]# /opt/pbs/bin/qdel<br>
&gt;             7.localhost.localdomain<br>
&gt;                    Connection timed out<br>
&gt;                    qdel: cannot connect to server
localhost.localdomain<br>
&gt;             (errno=110)<br>
&gt;                    Connection timed out<br>
&gt;                    You have new mail in /var/spool/mail/root<br>
&gt;                    [root@rufian bin]# /opt/pbs/bin/qdel<br>
&gt;             7.rufian.perrera.local<br>
&gt;                    qdel: Unknown Job Id 7.rufian.perrera.local<br>
&gt;                    [root@rufian bin]# su - samir<br>
&gt;                    [samir@rufian ~]$ /opt/pbs/bin/qdel<br>
&gt;             7.localhost.localdomain<br>
&gt;                    Connection timed out<br>
&gt;                    qdel: cannot connect to server
localhost.localdomain<br>
&gt;             (errno=110)<br>
&gt;                    Connection timed out<br>
&gt;                    [samir@rufian ~]$<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;            
------------------------------------------------------------------------<br>
&gt;<br>
&gt;             _______________________________________________<br>
&gt;             torqueusers mailing list<br>
&gt;             <a moz-do-not-send="true"
 href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
    </div>
    </div>
&gt;             &lt;mailto:<a moz-do-not-send="true"
 href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>&gt;<br>
    <div>
    <div class="h5">&gt;             <a moz-do-not-send="true"
 href="http://www.supercluster.org/mailman/listinfo/torqueusers"
 target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt;
------------------------------------------------------------------------<br>
&gt;<br>
&gt; _______________________________________________<br>
&gt; torqueusers mailing list<br>
&gt; <a moz-do-not-send="true"
 href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
&gt; <a moz-do-not-send="true"
 href="http://www.supercluster.org/mailman/listinfo/torqueusers"
 target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
    <br>
_______________________________________________<br>
torqueusers mailing list<br>
    <a moz-do-not-send="true" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
    <a moz-do-not-send="true"
 href="http://www.supercluster.org/mailman/listinfo/torqueusers"
 target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
    </div>
    </div>
  </blockquote>
  </div>
  <br>
</blockquote>
</body>
</html>