<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>127.0.0.1 is a special address that references localhost.<br>
<a class="moz-txt-link-freetext" href="http://en.wikipedia.org/wiki/Localhost">http://en.wikipedia.org/wiki/Localhost</a><br>
<br>
<br>
<br>
127.0.0.1 is not what you want for your hostname ( pbs_moms trying to
connect to 127.0.0.1 will try to talk to themselves)<br>
<br>
You will want to setup an IP address on your pbs_server/scheduler node
that corresponds to the network that your pbs_moms are on.<br>
And then make sure that the hostname you give it matches that of the
file in $PBS_HOME/server<br>
<br>
Copying the init script to /etc/init.d is a start, you will then
probably need to turn it on by running :<br>
<br>
To set it up to start on reboot:<br>
<br>
chkconfig add pbs_sched<br>
and then<br>
chkconfig pbs_sched on<br>
<br>
To start it use /etc/init.d/pbs_sched start<br>
<br>
<br>
</tt><tt>--Jerry<br>
</tt><br>
<br>
Samir Gartner wrote:
<blockquote
cite="mid:e73901d60905211226w3da1ae06i629c1ccb04f7922a@mail.gmail.com"
type="cite">Ok Gus and everyone. Thanks again for your answers.<br>
<br>
<div class="gmail_quote">There is no pbs_sched on /etc/init.d but it
is here:<br>
<br>
/usr/local/src/torque-2.3.6/contrib/init.d/pbs_sched<br>
/usr/local/src/torque-2.3.6/tpackages/server/opt/pbs/sbin/pbs_sched<br>
/usr/local/src/torque-2.3.6/src/scheduler.cc/.libs/pbs_sched<br>
/usr/local/src/torque-2.3.6/src/scheduler.cc/pbs_sched<br>
/opt/pbs/sbin/pbs_sched<br>
<br>
I was thinking copying /opt/pbs/sbin/pbs_sched to /etc/init.d. Is it
right to do that?<br>
<br>
Sorry about the "manually" word. It is local slang I guess. What I mean
is that I went to the /opt/pbs/sbin/ folder and executed ./pbs_sched<br>
<br>
hostname output is:<br>
<br>
rufian.perrera.local<br>
<br>
hosts file contain:<br>
<br>
# Do not remove the following line, or various programs<br>
# that require network functionality will fail.<br>
#127.0.0.1 localhost.localdomain localhost
<--------------------------Is this wrong?<br>
::1 localhost6.localdomain6 localhost6<br>
127.0.0.1 rufian.perrera.local rufian<br>
192.168.2.6 auyin.perrera.local auyin<br>
192.168.2.4 pelusa.perrera.local pelusa<br>
192.168.2.2 lamparita.perrera.local lamparita<br>
<br>
<br>
network content is:<br>
<br>
NETWORKING=yes<br>
HOSTNAME=rufian.perrera.local<br>
DOMAINNAME=perrera.local<br>
<br>
I dont have /etc/sysconfig/pbs_server nor /etc/sysconfig/pbs_sched
either <br>
<br>
<br>
2009/5/21 Gus Correa <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>></span><br>
<blockquote class="gmail_quote"
style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im">Samir Gartner wrote:<br>
> Ok, scheduling wasn't enabled,now it is,<br>
<br>
</div>
It happens very often.<br>
Fixing it is a good first step.<br>
<div class="im"><br>
> but pbs_sched service was not<br>
> found.<br>
<br>
</div>
Starting up daemons in YDog may be different from RHEL, CentOS, Fedora,<br>
so I am just guessing based on the latter. Not familiar to YDog.<br>
Anyway ...<br>
<br>
Don't know if you got Torque from ClusterResources or other.<br>
In any case, there should be a pbs_sched script on /etc/init.d<br>
If it is there, do "chkconfig --add pbs_sched" (or YDog equivalent),<br>
then do "chkconfig --list pbs_sched" to see which runlevels it will be<br>
on, then "service pbs_sched start" to start it, or if YDog doesn't have<br>
"service", run it with "/etc/init.d/pbs_sched start".<br>
<br>
If you don't have the pbs_sched script in /etc/init.d, you may find one<br>
in the contrib subdirectory of the Torque source tree.<br>
Copy it over to /etc/init.d, and do the above.<br>
(The location may be other than /etc/init.d in YDog.)<br>
<div class="im"><br>
<br>
> I didn't install maui, it is a default installation. About hosts<br>
> file, it is properly configured as well as nodes and mom's config
files.<br>
><br>
<br>
</div>
You only need Maui if you want a complex scheduling policy.<br>
pbs_sched is FIFO, very simple, but works fine.<br>
I've used it for a long time without problems.<br>
<div class="im"><br>
> when I manually start pbs_sched it says<br>
><br>
> pbs_sched: addclient, host localhost not found<br>
><br>
<br>
</div>
Hmm ... never got this one, not that I remember.<br>
Not sure what you mean by "manually start pbs_sched".<br>
Anyway, sounds as another, different, problem.<br>
<br>
<br>
Is it possible that your "hostname" command<br>
is not resolving your server name to rufian.perrera.local but to<br>
localhost?<br>
What is the output of "hostname"?<br>
What do you have in /etc/hosts?<br>
What do you have in /etc/sysconfig/network?<br>
<br>
Just in case you have /etc/sysconfig/pbs_server and<br>
/etc/sysconfig/pbs_sched, what is the contents?<br>
(I don't have them.)<br>
<br>
(Again just guessing, YDog may have different files to startup things.)<br>
<div class="im"><br>
I hope this helps,<br>
Gus Correa<br>
---------------------------------------------------------------------<br>
Gustavo Correa<br>
Lamont-Doherty Earth Observatory - Columbia University<br>
Palisades, NY, 10964-8000 - USA<br>
---------------------------------------------------------------------<br>
<br>
><br>
</div>
> 2009/5/21 Samir Gartner <<a moz-do-not-send="true"
href="mailto:jigzat@gmail.com">jigzat@gmail.com</a> <mailto:<a
moz-do-not-send="true" href="mailto:jigzat@gmail.com">jigzat@gmail.com</a>>><br>
<div class="im">><br>
> I think I'm gonna cry.... I love you guys!! No, seriously, it
worked<br>
> but only if executed under root user, now the question is what
did I<br>
> do wrong? Jobs should start automatically, right?<br>
><br>
> I was following first the Globus tootlikt tutorial but it is
kinda<br>
> outdated so I guess I issued some wrong instructions.<br>
><br>
> On of the weird things was that the tutorial suggested using
the<br>
> /opt/pbs prefix when executing configure and now I have under<br>
> /opt/pbs again a /opt/pbs folder with repeated bin and sbin
folders<br>
> and executables. Is this wrong or is how it is supposed to be?<br>
><br>
</div>
> 2009/5/21 Ling C. Ho <<a moz-do-not-send="true"
href="mailto:ling@fnal.gov">ling@fnal.gov</a> <mailto:<a
moz-do-not-send="true" href="mailto:ling@fnal.gov">ling@fnal.gov</a>>><br>
<div class="im">><br>
> Have you configured a scheduler?<br>
><br>
> What if you use qrun. Would any job starts?<br>
><br>
> ...<br>
> ling<br>
><br>
> Samir Gartner wrote:<br>
><br>
> Ok, I don't see any file named default_server but<br>
> server_name has the right server name
rufian.perrera.local<br>
> and there is another file with the same content named<br>
> server_name.new.<br>
><br>
> Righ now the PSB server name apears to be correct
(after<br>
> stoping the server and manually deletting the zombie
jobs)<br>
> but stil the jobs won't start.<br>
><br>
><br>
> [samir@rufian ~]$ echo "sleep 30;date" |
/opt/pbs/bin/qsub<br>
> [samir@rufian ~]$ /opt/pbs/bin/qstat -a<br>
><br>
> rufian.perrera.local:<br>
><br>
> Req'd Req'd Elap<br>
> Job ID Username Queue Jobname<br>
> SessID NDS TSK Memory Time S Time<br>
> -------------------- -------- -------- ----------------<br>
> ------ ----- --- ------ ----- - -----<br>
> 13.rufian.perrer samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> [samir@rufian ~]$<br>
><br>
><br>
> by the way, is it top posting allowed??<br>
><br>
> 2009/5/21 Jerry Smith <<a moz-do-not-send="true"
href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a><br>
</div>
> <mailto:<a moz-do-not-send="true"
href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a>> <mailto:<a
moz-do-not-send="true" href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a><br>
<div>
<div class="h5">> <mailto:<a
moz-do-not-send="true" href="mailto:jdsmit@sandia.gov">jdsmit@sandia.gov</a>>>><br>
><br>
><br>
> Samir,<br>
><br>
> What do you have in
$PBS_HOME/{server_name,default_server}?<br>
><br>
> It should be what resolves as the ethernet address
that<br>
> pbs should<br>
> be listening on.<br>
><br>
> --Jerry<br>
><br>
><br>
><br>
><br>
> Samir Gartner wrote:<br>
><br>
> Ok I finally installed torque under
yellowdog/ppc but<br>
> now I have<br>
> another problem. I set up my pbs server as<br>
> rufian.perrera.local<br>
> but when I issue a job it shows itself in<br>
> localhost.localdomain<br>
> and it stays on queued state forever. And if i
try to<br>
> qdel the<br>
> job it cant reach the server and the conection
times<br>
> out. Any<br>
> ideas of what could be wrong?<br>
> I'm not trying to set up anything complicated,
is<br>
> just one<br>
> machine that works as server and client.<br>
><br>
> this is the shell output<br>
><br>
> [root@rufian bin]# /opt/pbs/bin/qstat -a<br>
><br>
> rufian.perrera.local:<br>
><br>
> Req'd Req'd Elap<br>
> Job ID Username Queue Jobname<br>
> SessID<br>
> NDS TSK Memory Time S Time<br>
> -------------------- -------- --------<br>
> ---------------- ------<br>
> ----- --- ------ ----- - -----<br>
> 7.localhost.loca samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> 8.localhost.loca samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> 9.localhost.loca samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> 10.localhost.loc samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> [root@rufian bin]# /opt/pbs/bin/qdel<br>
> 7.localhost.localdomain<br>
> Connection timed out<br>
> qdel: cannot connect to server
localhost.localdomain<br>
> (errno=110)<br>
> Connection timed out<br>
> You have new mail in /var/spool/mail/root<br>
> [root@rufian bin]# /opt/pbs/bin/qdel<br>
> 7.rufian.perrera.local<br>
> qdel: Unknown Job Id 7.rufian.perrera.local<br>
> [root@rufian bin]# su - samir<br>
> [samir@rufian ~]$ /opt/pbs/bin/qdel<br>
> 7.localhost.localdomain<br>
> Connection timed out<br>
> qdel: cannot connect to server
localhost.localdomain<br>
> (errno=110)<br>
> Connection timed out<br>
> [samir@rufian ~]$<br>
><br>
><br>
><br>
><br>
>
------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> torqueusers mailing list<br>
> <a moz-do-not-send="true"
href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
</div>
</div>
> <mailto:<a moz-do-not-send="true"
href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>><br>
<div>
<div class="h5">> <a moz-do-not-send="true"
href="http://www.supercluster.org/mailman/listinfo/torqueusers"
target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
>
------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> torqueusers mailing list<br>
> <a moz-do-not-send="true"
href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
> <a moz-do-not-send="true"
href="http://www.supercluster.org/mailman/listinfo/torqueusers"
target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a moz-do-not-send="true" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a moz-do-not-send="true"
href="http://www.supercluster.org/mailman/listinfo/torqueusers"
target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div>
</div>
</blockquote>
</div>
<br>
</blockquote>
</body>
</html>