Sorry for being so insistent but I have like 8 days to finish my graduation proyect.<br><br>It seems that the jobs now try to start but there is no output except an email that says someting like this: <br><br>From adm@rufian.perrera.local Tue Apr 21 15:28:59 2009<br>
Return-Path: <adm@rufian.perrera.local><br>Received: from rufian.perrera.local (rufian.perrera.local [127.0.0.1])<br> by rufian.perrera.local (8.13.8/8.13.8) with ESMTP id n3LKSwDi006753<br> for <samir@rufian.perrera.local>; Tue, 21 Apr 2009 15:28:59 -0500<br>
Received: (from root@localhost)<br> by rufian.perrera.local (8.13.8/8.13.8/Submit) id n3LKSwro006752<br> for samir@rufian.perrera.local; Tue, 21 Apr 2009 15:28:58 -0500<br>Date: Tue, 21 Apr 2009 15:28:58 -0500<br>
From: adm <adm@rufian.perrera.local><br>Message-Id: <200904212028.n3LKSwro006752@rufian.perrera.local><br>To: samir@rufian.perrera.local<br>Subject: PBS JOB 21.rufian.perrera.local<br><br>PBS Job Id: 21.rufian.perrera.local<br>
Job Name: STDIN<br>Exec host: rufian.perrera.local/0<br>Aborted by PBS Server<br>Job does not exist on node<br><br><div class="gmail_quote">2009/5/21 Samir Gartner <span dir="ltr"><<a href="mailto:jigzat@gmail.com">jigzat@gmail.com</a>></span><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Ok Gus and everyone. Thanks again for your answers.<br><br><div class="gmail_quote">There is no pbs_sched on /etc/init.d but it is here:<br>
<br>/usr/local/src/torque-2.3.6/contrib/init.d/pbs_sched<br>/usr/local/src/torque-2.3.6/tpackages/server/opt/pbs/sbin/pbs_sched<br>
/usr/local/src/torque-2.3.6/src/scheduler.cc/.libs/pbs_sched<br>/usr/local/src/torque-2.3.6/src/scheduler.cc/pbs_sched<br>/opt/pbs/sbin/pbs_sched<br><br>I was thinking copying /opt/pbs/sbin/pbs_sched to /etc/init.d. Is it right to do that?<br>
<br>Sorry about the "manually" word. It is local slang I guess. What I mean is that I went to the /opt/pbs/sbin/ folder and executed ./pbs_sched<br><br>hostname output is:<br><br>rufian.perrera.local<br><br>hosts file contain:<br>
<br># Do not remove the following line, or various programs<br># that require network functionality will fail.<br>#127.0.0.1 localhost.localdomain localhost <--------------------------Is this wrong?<br>
::1 localhost6.localdomain6 localhost6<br>127.0.0.1 rufian.perrera.local rufian<br>192.168.2.6 auyin.perrera.local auyin<br>192.168.2.4 pelusa.perrera.local pelusa<br>192.168.2.2 lamparita.perrera.local lamparita<br>
<br><br>network content is:<br><br>NETWORKING=yes<br>HOSTNAME=rufian.perrera.local<br>DOMAINNAME=perrera.local<br><br>I dont have /etc/sysconfig/pbs_server nor /etc/sysconfig/pbs_sched either <br><br><br>2009/5/21 Gus Correa <span dir="ltr"><<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>></span><div>
<div></div><div class="h5"><br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div>Samir Gartner wrote:<br>
> Ok, scheduling wasn't enabled,now it is,<br>
<br>
</div>It happens very often.<br>
Fixing it is a good first step.<br>
<div><br>
> but pbs_sched service was not<br>
> found.<br>
<br>
</div>Starting up daemons in YDog may be different from RHEL, CentOS, Fedora,<br>
so I am just guessing based on the latter. Not familiar to YDog.<br>
Anyway ...<br>
<br>
Don't know if you got Torque from ClusterResources or other.<br>
In any case, there should be a pbs_sched script on /etc/init.d<br>
If it is there, do "chkconfig --add pbs_sched" (or YDog equivalent),<br>
then do "chkconfig --list pbs_sched" to see which runlevels it will be<br>
on, then "service pbs_sched start" to start it, or if YDog doesn't have<br>
"service", run it with "/etc/init.d/pbs_sched start".<br>
<br>
If you don't have the pbs_sched script in /etc/init.d, you may find one<br>
in the contrib subdirectory of the Torque source tree.<br>
Copy it over to /etc/init.d, and do the above.<br>
(The location may be other than /etc/init.d in YDog.)<br>
<div><br>
<br>
> I didn't install maui, it is a default installation. About hosts<br>
> file, it is properly configured as well as nodes and mom's config files.<br>
><br>
<br>
</div>You only need Maui if you want a complex scheduling policy.<br>
pbs_sched is FIFO, very simple, but works fine.<br>
I've used it for a long time without problems.<br>
<div><br>
> when I manually start pbs_sched it says<br>
><br>
> pbs_sched: addclient, host localhost not found<br>
><br>
<br>
</div>Hmm ... never got this one, not that I remember.<br>
Not sure what you mean by "manually start pbs_sched".<br>
Anyway, sounds as another, different, problem.<br>
<br>
<br>
Is it possible that your "hostname" command<br>
is not resolving your server name to rufian.perrera.local but to<br>
localhost?<br>
What is the output of "hostname"?<br>
What do you have in /etc/hosts?<br>
What do you have in /etc/sysconfig/network?<br>
<br>
Just in case you have /etc/sysconfig/pbs_server and<br>
/etc/sysconfig/pbs_sched, what is the contents?<br>
(I don't have them.)<br>
<br>
(Again just guessing, YDog may have different files to startup things.)<br>
<div><br>
I hope this helps,<br>
Gus Correa<br>
---------------------------------------------------------------------<br>
Gustavo Correa<br>
Lamont-Doherty Earth Observatory - Columbia University<br>
Palisades, NY, 10964-8000 - USA<br>
---------------------------------------------------------------------<br>
<br>
><br>
</div>> 2009/5/21 Samir Gartner <<a href="mailto:jigzat@gmail.com" target="_blank">jigzat@gmail.com</a> <mailto:<a href="mailto:jigzat@gmail.com" target="_blank">jigzat@gmail.com</a>>><br>
<div>><br>
> I think I'm gonna cry.... I love you guys!! No, seriously, it worked<br>
> but only if executed under root user, now the question is what did I<br>
> do wrong? Jobs should start automatically, right?<br>
><br>
> I was following first the Globus tootlikt tutorial but it is kinda<br>
> outdated so I guess I issued some wrong instructions.<br>
><br>
> On of the weird things was that the tutorial suggested using the<br>
> /opt/pbs prefix when executing configure and now I have under<br>
> /opt/pbs again a /opt/pbs folder with repeated bin and sbin folders<br>
> and executables. Is this wrong or is how it is supposed to be?<br>
><br>
</div>> 2009/5/21 Ling C. Ho <<a href="mailto:ling@fnal.gov" target="_blank">ling@fnal.gov</a> <mailto:<a href="mailto:ling@fnal.gov" target="_blank">ling@fnal.gov</a>>><br>
<div>><br>
> Have you configured a scheduler?<br>
><br>
> What if you use qrun. Would any job starts?<br>
><br>
> ...<br>
> ling<br>
><br>
> Samir Gartner wrote:<br>
><br>
> Ok, I don't see any file named default_server but<br>
> server_name has the right server name rufian.perrera.local<br>
> and there is another file with the same content named<br>
> server_name.new.<br>
><br>
> Righ now the PSB server name apears to be correct (after<br>
> stoping the server and manually deletting the zombie jobs)<br>
> but stil the jobs won't start.<br>
><br>
><br>
> [samir@rufian ~]$ echo "sleep 30;date" | /opt/pbs/bin/qsub<br>
> [samir@rufian ~]$ /opt/pbs/bin/qstat -a<br>
><br>
> rufian.perrera.local:<br>
><br>
> Req'd Req'd Elap<br>
> Job ID Username Queue Jobname<br>
> SessID NDS TSK Memory Time S Time<br>
> -------------------- -------- -------- ----------------<br>
> ------ ----- --- ------ ----- - -----<br>
> 13.rufian.perrer samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> [samir@rufian ~]$<br>
><br>
><br>
> by the way, is it top posting allowed??<br>
><br>
> 2009/5/21 Jerry Smith <<a href="mailto:jdsmit@sandia.gov" target="_blank">jdsmit@sandia.gov</a><br>
</div>> <mailto:<a href="mailto:jdsmit@sandia.gov" target="_blank">jdsmit@sandia.gov</a>> <mailto:<a href="mailto:jdsmit@sandia.gov" target="_blank">jdsmit@sandia.gov</a><br>
<div><div></div><div>> <mailto:<a href="mailto:jdsmit@sandia.gov" target="_blank">jdsmit@sandia.gov</a>>>><br>
><br>
><br>
> Samir,<br>
><br>
> What do you have in $PBS_HOME/{server_name,default_server}?<br>
><br>
> It should be what resolves as the ethernet address that<br>
> pbs should<br>
> be listening on.<br>
><br>
> --Jerry<br>
><br>
><br>
><br>
><br>
> Samir Gartner wrote:<br>
><br>
> Ok I finally installed torque under yellowdog/ppc but<br>
> now I have<br>
> another problem. I set up my pbs server as<br>
> rufian.perrera.local<br>
> but when I issue a job it shows itself in<br>
> localhost.localdomain<br>
> and it stays on queued state forever. And if i try to<br>
> qdel the<br>
> job it cant reach the server and the conection times<br>
> out. Any<br>
> ideas of what could be wrong?<br>
> I'm not trying to set up anything complicated, is<br>
> just one<br>
> machine that works as server and client.<br>
><br>
> this is the shell output<br>
><br>
> [root@rufian bin]# /opt/pbs/bin/qstat -a<br>
><br>
> rufian.perrera.local:<br>
><br>
> Req'd Req'd Elap<br>
> Job ID Username Queue Jobname<br>
> SessID<br>
> NDS TSK Memory Time S Time<br>
> -------------------- -------- --------<br>
> ---------------- ------<br>
> ----- --- ------ ----- - -----<br>
> 7.localhost.loca samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> 8.localhost.loca samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> 9.localhost.loca samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> 10.localhost.loc samir batch STDIN<br>
> -- 1 -- -- 01:00 Q --<br>
> [root@rufian bin]# /opt/pbs/bin/qdel<br>
> 7.localhost.localdomain<br>
> Connection timed out<br>
> qdel: cannot connect to server localhost.localdomain<br>
> (errno=110)<br>
> Connection timed out<br>
> You have new mail in /var/spool/mail/root<br>
> [root@rufian bin]# /opt/pbs/bin/qdel<br>
> 7.rufian.perrera.local<br>
> qdel: Unknown Job Id 7.rufian.perrera.local<br>
> [root@rufian bin]# su - samir<br>
> [samir@rufian ~]$ /opt/pbs/bin/qdel<br>
> 7.localhost.localdomain<br>
> Connection timed out<br>
> qdel: cannot connect to server localhost.localdomain<br>
> (errno=110)<br>
> Connection timed out<br>
> [samir@rufian ~]$<br>
><br>
><br>
><br>
><br>
> ------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> torqueusers mailing list<br>
> <a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
</div></div>> <mailto:<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a>><br>
<div><div></div><div>> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> ------------------------------------------------------------------------<br>
><br>
> _______________________________________________<br>
> torqueusers mailing list<br>
> <a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div></div></blockquote></div></div></div><br>
</blockquote></div><br>