<br><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Michael Arndt</b> <span dir="ltr"><<a href="mailto:m.arndt@science-computing.de">m.arndt@science-computing.de</a>></span><br>
Date: Wed, Feb 15, 2012 at 8:44 PM<br>Subject: Re: [torqueusers] Job execution problem<br>To: Vahe nr <<a href="mailto:vner75@gmail.com">vner75@gmail.com</a>><br><br><br>Hi Vahe<br>
<br>
you need to resend you last mail separately to the list<br>
by pressing reply you answered via PM to me,<br>
since i answered you intentionally of list.<br>
<br>
As far as your messages sugest this is not a problem related<br>
to your job only<br>
<br>
The messages suggest that pbs_mom the demon that runs your<br>
job on the exec node does not talk "well" with the PBS Servers<br>
master processes<br>
<br>
The best way to resolve the problems is:<br>
<br>
read mom logs on the exec node<br>
read sched and server logs on the master<br>
check the connection with pbs_iff<br>
<br>
It is an PBS config issue not an job problem<br>
Micha<br>
<div class="im"><br>
On Wed, Feb 15, 2012 at 08:31:34PM +0400, Vahe nr wrote:<br>
> Dear all<br>
> The job is always remains on Q state, when I am trying to run it with qrun<br>
> command I am getting the following error:<br>
> qrun: Execution server rejected request MSG=cannot send job to mom,<br>
> state=PRERUN <a href="http://220.ce.seua-cluster.grid.am" target="_blank">220.ce.seua-cluster.grid.am</a><br>
><br>
> Cheers<br>
><br>
> On Wed, Feb 15, 2012 at 8:03 PM, Vahe nr <<a href="mailto:vner75@gmail.com">vner75@gmail.com</a>> wrote:<br>
><br>
> > Hi Michael<br>
> > The PBS has the same version on node and master, and the host name is<br>
> > right. I will try to use pbs_iff and let see what I will explore!<br>
> ><br>
> > Cheers<br>
> ><br>
> > On Wed, Feb 15, 2012 at 7:54 PM, Vahe nr <<a href="mailto:vner75@gmail.com">vner75@gmail.com</a>> wrote:<br>
> ><br>
> >> Hi Michael<br>
</div>> >> *Thanks for your replay, I will check what you have suggested and let<br>
> >> you know, I hope it will help.*<br>
> >> *<br>
> >> *<br>
> >> *Cheers*<br>
<div class="HOEnZb"><div class="h5">> >><br>
> >> On Wed, Feb 15, 2012 at 7:04 PM, Michael Arndt <<br>
> >> <a href="mailto:m.arndt@science-computing.de">m.arndt@science-computing.de</a>> wrote:<br>
> >><br>
> >>> Hello Vahe,<br>
> >>><br>
> >>> offlist:<br>
> >>><br>
> >>> -checks:<br>
> >>><br>
> >>> -is the PBS Version really the same on the nodes / exec hosts like for<br>
> >>> the pbs master<br>
> >>><br>
> >>> -is the name shown for a node with pnsnodes node the same that is<br>
> >>> shown by an ssh node from your pbsmaster<br>
> >>> ( in other words: is name resolution DNS / NIS / hosts whatever the same<br>
> >>> when the PBS Master ask like what the node believes for hostnames of<br>
> >>> itself and<br>
> >>> master ?<br>
> >>><br>
> >>><br>
> >>> -google for pbs_iff<br>
> >>> The Hits will show you how to use pbs_iff to test the connectivity from<br>
> >>> node to master<br>
> >>><br>
> >>> last but not least the PBS Sched_logs on Server and Mom Logs on exec host<br>
> >>> will show info aboit the problem<br>
> >>><br>
> >>><br>
> >>> Micha<br>
> >>><br>
> >>> --<br>
> >>> Vorstand/Board of Management:<br>
> >>> Dr. Bernd Finkbeiner, Michael Heinrichs,<br>
> >>> Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech<br>
> >>> Vorsitzender des Aufsichtsrats/<br>
> >>> Chairman of the Supervisory Board:<br>
> >>> Philippe Miltin<br>
> >>> Sitz/Registered Office: Tuebingen<br>
> >>> Registergericht/Registration Court: Stuttgart<br>
> >>> Registernummer/Commercial Register No.: HRB 382196<br>
> >>><br>
> >>><br>
> >><br>
> ><br>
--<br>
Vorstand/Board of Management:<br>
Dr. Bernd Finkbeiner, Michael Heinrichs,<br>
Dr. Roland Niemeier, Dr. Arno Steitz, Dr. Ingrid Zech<br>
Vorsitzender des Aufsichtsrats/<br>
Chairman of the Supervisory Board:<br>
Philippe Miltin<br>
Sitz/Registered Office: Tuebingen<br>
Registergericht/Registration Court: Stuttgart<br>
Registernummer/Commercial Register No.: HRB 382196<br>
<br>
</div></div></div><br>