Hi<div><br></div><div>The varies /etc/hosts, nodes, server_name and config files and seem to be consistent. The nodes are indeed connected to the internet, could that be problematic?</div><div><br></div><div>As for 5), won't that require $PBS_NODEFILE to be correctly generated?</div>
<div><br></div><div>Regards</div><div>Gordon<br clear="all"><br>-- max(∫(εὐδαιμονία)dt)<br><br>Dr Gordon Wells<br>Bioinformatics and Computational Biology Unit<br>Department of Biochemistry<br>University of Pretoria<br>
<br><br><div class="gmail_quote">On 8 October 2010 01:09, Gus Correa <span dir="ltr"><<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
Hi Gordon<br>
<br>
Some guesses:<br>
<br>
1) Do you have mom daemons running on the nodes?<br>
I.e. on the nodes, what is the output of "service pbs status" or<br>
"service pbs_mom status"?<br>
<br>
2) Do your mom daemons on the nodes point to the server?<br>
I.e. what is the content of $TORQUE/mom_priv/config?<br>
Is it consistent with the server name in $TORQUE/server_name ?<br>
<br>
3) What is the content of your /etc/hosts file on the head node<br>
and on each node?<br>
Are they the same?<br>
Are they consistent with your nodes file,<br>
i.e. head_node:$TORQUE/server_priv/nodes (i.e. same host names<br>
that have IP addresses listed in /etc/hosts)?<br>
<br>
4) Are you really using the Internet to connect the nodes,<br>
as the fqdn names on your nodes file (sent in an old email) suggest?<br>
(I can't find it, maybe you can post it again.)<br>
Or are you using a private subnet?<br>
<br>
5) Did you try to run hostname via mpirun on all nodes?<br>
I.e., something like this:<br>
<br>
...<br>
#PBS -l nodes=8:ppn=2<br>
...<br>
mpirun -np 16 hostname<br>
<br>
<br>
I hope this helps,<br>
Gus Correa<br>
<div class="im"><br>
Gordon Wells wrote:<br>
> I've tried that, unfortunately I never get a $PBS_NODEFILE that spans<br>
> more than one node.<br>
><br>
> -- max(∫(εὐδαιμονία)dt)<br>
><br>
> Dr Gordon Wells<br>
> Bioinformatics and Computational Biology Unit<br>
> Department of Biochemistry<br>
> University of Pretoria<br>
><br>
><br>
> On 7 October 2010 10:02, Vaibhav Pol <<a href="mailto:vaibhavp@cdac.in">vaibhavp@cdac.in</a><br>
</div><div><div></div><div class="h5">> <mailto:<a href="mailto:vaibhavp@cdac.in">vaibhavp@cdac.in</a>>> wrote:<br>
><br>
> Hi ,<br>
> you must set server as well as queue attribute.<br>
><br>
> set server resources_available.nodect = (number of nodes *<br>
> cpus per node)<br>
> set <queue name> resources_available.nodect = (number of<br>
> nodes * cpus per node)<br>
><br>
><br>
> Thanks and regards,<br>
> Vaibhav Pol<br>
> National PARAM Supercomputing Facility<br>
> Centre for Development of Advanced Computing<br>
> Ganeshkhind Road<br>
> Pune University Campus<br>
> PUNE-Maharastra<br>
> Phone +91-20-25704176 ext: 176<br>
> Cell Phone : +919850466409<br>
><br>
><br>
><br>
> On Thu, 7 Oct 2010, Gordon Wells wrote:<br>
><br>
> Hi<br>
><br>
> I've now tried torque 2.5.2 as well, same problems.<br>
> Setting resources_available.nodect has no effect except allowing<br>
> me to use<br>
> "-l nodes=x" with x > 14<br>
><br>
> regards<br>
><br>
> -- max(∫(εὐδαιμονία)dt)<br>
><br>
> Dr Gordon Wells<br>
> Bioinformatics and Computational Biology Unit<br>
> Department of Biochemistry<br>
> University of Pretoria<br>
><br>
><br>
> On 6 October 2010 20:04, Glen Beane <<a href="mailto:glen.beane@gmail.com">glen.beane@gmail.com</a><br>
</div></div><div class="im">> <mailto:<a href="mailto:glen.beane@gmail.com">glen.beane@gmail.com</a>>> wrote:<br>
><br>
> On Wed, Oct 6, 2010 at 1:12 PM, Gordon Wells<br>
</div>> <<a href="mailto:gordon.wells@gmail.com">gordon.wells@gmail.com</a> <mailto:<a href="mailto:gordon.wells@gmail.com">gordon.wells@gmail.com</a>>><br>
<div class="im">> wrote:<br>
><br>
> Can I confirm that this will definitely fix the problem?<br>
> Unfortunately<br>
><br>
> this<br>
><br>
> cluster also needs to be glite compatible, 2.3.6 seems<br>
> to be the latest<br>
><br>
> that<br>
><br>
> will work<br>
><br>
><br>
><br>
> i'm not certain... do you happen to have set server<br>
> resources_available.nodect set? I have seen bugs with<br>
> PBS_NODEFILE<br>
> contents when this server attribute is set. This may be a<br>
> manifestation of this bug, and I'm not sure if it has been<br>
> corrected.<br>
><br>
> try unsetting this and submitting a job with -l nodes=X:ppn=Y<br>
> _______________________________________________<br>
> torqueusers mailing list<br>
> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
</div>> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>><br>
<div class="im">> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
><br>
><br>
> --<br>
> This message has been scanned for viruses and<br>
> dangerous content by MailScanner, and is<br>
> believed to be clean.<br>
><br>
><br>
> --<br>
> This message has been scanned for viruses and<br>
> dangerous content by MailScanner, and is<br>
> believed to be clean.<br>
><br>
><br>
> _______________________________________________<br>
> torqueusers mailing list<br>
</div>> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>><br>
<div class="im">> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
><br>
><br>
><br>
</div>> ------------------------------------------------------------------------<br>
<div><div></div><div class="h5">><br>
> _______________________________________________<br>
> torqueusers mailing list<br>
> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div></div></blockquote></div><br></div>