Hi Garrick,<br><br>Tanks for your halep and sorry about this too. I got the logs that you said and I'll send to you in this e-mail. I sow the logs of my nodes and them are the same, with teh same error message. So, here they go...
<br><br>Nodes - bangu07:<br>[root@bangu08 ~]# cat /var/spool/torque/mom_logs/20071123 <br>11/23/2007 07:57:58;0002; pbs_mom;Svr;Log;Log opened<br>11/23/2007 07:57:58;0002; pbs_mom;n/a;initialize;independent<br>11/23/2007 07:57:58;0002; pbs_mom;Svr;pbs_mom;Is up
<br>11/23/2007 07:57:58;0002; pbs_mom;Svr;mom_main;MOM executable path and mtime at launch: /usr/local/sbin/pbs_mom 1193074779<br>11/23/2007 07:57:58;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 07:59:28;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout
<br>11/23/2007 07:59:28;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 08:00:58;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout<br>11/23/2007 08:00:58;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00
<br>11/23/2007 08:02:28;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout<br>11/23/2007 08:02:28;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 08:03:58;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout
<br>11/23/2007 08:03:58;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 08:05:28;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout<br>(...) <- The same message many times...<br>11/23/2007 11:49:08;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout
<br>11/23/2007 11:49:08;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 11:50:38;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout<br>11/23/2007 11:50:38;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00
<br>11/23/2007 11:52:08;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout<br>11/23/2007 11:52:08;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 11:53:38;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout
<br>11/23/2007 11:53:38;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 11:55:07;0002; pbs_mom;Svr;pbs_mom;caught signal 15: leaving jobs running, just exiting<br>11/23/2007 15:40:21;0002; pbs_mom;Svr;Log;Log opened
<br>11/23/2007 15:40:21;0002; pbs_mom;n/a;initialize;independent<br>11/23/2007 15:40:21;0002; pbs_mom;Svr;pbs_mom;Is up<br>11/23/2007 15:40:21;0002; pbs_mom;Svr;mom_main;MOM executable path and mtime at launch: /usr/local/sbin/pbs_mom 1193074779
<br>11/23/2007 15:40:21;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 15:58:21;0002; pbs_mom;Svr;im_eof;End of File from addr <a href="http://146.164.41.100:15001">146.164.41.100:15001</a><br>11/23/2007 15:58:21;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00
<br>11/23/2007 15:59:55;0002; pbs_mom;n/a;mom_main;connection to server bangu00 timeout<br>11/23/2007 15:59:55;0002; pbs_mom;n/a;mom_main;hello sent to server bangu00<br>11/23/2007 16:48:36;0002; pbs_mom;Svr;pbs_mom;caught signal 15: leaving jobs running, just exiting
<br><br>The syslog I didn't find here, where would i look for !?<br><br>Other question: Do I need run the pbs_mom on the server with pbs_server and pbs_mom !?<br><br>Tanks.<br><br><div><span class="gmail_quote">2007/11/26, Garrick Staples <
<a href="mailto:garrick@usc.edu">garrick@usc.edu</a>>:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">On Sat, Nov 24, 2007 at 04:01:44PM -0200, Davi Vercillo alleged:
<br>> Hi all,<br>><br>><br>> 2007/11/24, Garrick Staples <<a href="mailto:garrick@usc.edu">garrick@usc.edu</a>>:<br>> ><br>> > On Fri, Nov 23, 2007 at 04:42:51PM -0200, Davi Vercillo alleged:
<br>> > > 11/23/2007 16:14:37 S Job Modified at request of<br>> > > <a href="mailto:Scheduler@bangu00.dcc.ufrj.br">Scheduler@bangu00.dcc.ufrj.br</a><br>> > > 11/23/2007 16:14:37 S Job Run at request of
<br>> > > <a href="mailto:Scheduler@bangu00.dcc.ufrj.br">Scheduler@bangu00.dcc.ufrj.br</a><br>> > > 11/23/2007 16:14:39 S unable to run job, MOM rejected/rc=2<br>> ><br>> > Your server config is fine. The problem is on the node. The error message
<br>> > will<br>> > be in the mom log, syslog on the node, or sent to the job owner by email.<br>><br>><br>> What do i need configure on the nodes to be correctly ? I did what the Wiki<br>> page sad to do. Do I need insert others parameters !? What do you think that
<br>> is the problems ?<br>><br>> PS: Sorry about my English. =S<br><br>I don't know. You need to check for error messages in the mom log, syslog on<br>the node, or sent to the job owner by email.<br><br><br>
_______________________________________________<br>torqueusers mailing list<br><a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br><a href="http://www.supercluster.org/mailman/listinfo/torqueusers">
http://www.supercluster.org/mailman/listinfo/torqueusers</a><br><br><br></blockquote></div><br>