<br><meta http-equiv="content-type" content="text/html; charset=utf-8"><a href="http://debianclusters.cs.uni.edu/index.php/MPICH_with_Torque_Functionality">debianclusters.cs.uni.edu/index.php/MPICH_with_Torque_Functionality</a><div>
<br></div><div><br></div><div><a href="http://debianclusters.cs.uni.edu/index.php/MPICH_with_Torque_Functionality"></a><br><div class="gmail_quote">On Tue, Apr 20, 2010 at 1:04 PM, Si Hammond <span dir="ltr"><<a href="mailto:simon.hammond@gmail.com">simon.hammond@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div style="word-wrap:break-word">Hi,<div><br></div><div>We're running 2.4.7 and I can cat the $PBS_NODEFILE in both -l nodes=2:ppn=2 and -l nodes=1:ppn=2 configurations (i.e. works for me fine).</div>
<div><br></div><div>If you have built OpenMPI with --with-tm then you shouldn't need to specify the node file right? The runtime picks this up from the PBS engine during execution?</div><div><br></div><div>Have you tried just a basic mpirun ./pingpong or something like that?</div>
<div><br></div><div><br></div><div><br></div><div><br></div><div>S.</div><div><br></div><div><br><div><div><div></div><div class="h5"><div>On 20 Apr 2010, at 13:57, alap pandya wrote:</div><br></div></div><blockquote type="cite">
<div><div></div><div class="h5">Hi,<br><br>I am facing issue while running job on multiple nodes on torque . Please give me your suggestion. <br><br><br>Issue :<br>When i changed <b>#PBS -l nodes=1:ppn=2 ----> </b> <b>#PBS -l nodes=2:ppn=2</b> in script , PBS_NODEFILE is not created and finally not able to run job.<br>
<br>Note : similar issues mentioned at <br> <b><a href="http://www.clusterresources.com/pipermail/torqueusers/2006-October/004434.html" target="_blank">http://www.clusterresources.com/pipermail/torqueusers/2006-October/004434.html</a><br>
<span dir="ltr"></span><a href="http://www.clusterresources.com/pipermail/torqueusers/2010-January/009890.html" target="_blank">http://www.clusterresources.com/pipermail/torqueusers/2010-January/009890.html</a><br>
</b><br>
<br><b>Torque : 2.4.6 </b><br><br>1> Running fine with single node.<br><br>#!/bin/sh<br><b>#PBS -l nodes=1:ppn=2</b><br>echo "HOSTNAME : $HOSTNAME"<br>echo "PBS_NODEFILE = $PBS_NODEFILE"<br>cd /disk<br>
#echo $PBS_NODEFILE > shreenivas<br>
cat $PBS_NODEFILE > pbsnodes<br>mpirun --hostfile $PBS_NODEFILE ./job1_100<br><br><br><b>[root@cluster disk]# cat pbsnodes <br><a href="http://cluster.hpc.org/" target="_blank">cluster.hpc.org</a><br><a href="http://cluster.hpc.org/" target="_blank">cluster.hpc.org</a><br>
<br></b>job is running fine with 2 processes on single node.<br><br>2> changed <b>#PBS -l <span style="background-color:rgb(255, 0, 0)">nodes=1</span>:ppn=2 ----> </b> <b>#PBS -l <span style="background-color:rgb(204, 0, 0)">nodes=2</span>:ppn=2</b> .....<br>
<br>#!/bin/sh<br>
<b>#PBS -l nodes=2:ppn=2</b><br>
echo "HOSTNAME : $HOSTNAME"<br>
echo "PBS_NODEFILE = $PBS_NODEFILE"<br>
cd /disk<br>
cat $PBS_NODEFILE > pbsnodes<br>
mpirun --hostfile $PBS_NODEFILE ./job1_100<br><br><b>[root@cluster disk]# cat pbsnodes <br></b><b></b>there is no file created this time .....something strange ...no mpi job is running on any nodes(compute-0-5,cluster) as shown in <b>tracejob</b> output mentioned below. .<br>
<br><b>tracejob output :</b><br><br>04/20/2010 18:04:14 S enqueuing into test, state 1 hop 1<br>04/20/2010 18:04:14 S Job Queued at request of root@cluster, owner = root@cluster, job name<br> = a.sh, queue = test<br>
04/20/2010 18:04:14 S Job Run at request of root@cluster<br>04/20/2010 18:04:14 A queue=test<br>04/20/2010 18:04:14 A user=root group=root jobname=a.sh queue=test ctime=1271766854<br> qtime=1271766854 etime=1271766854 start=1271766854 owner=root@cluster<br>
exec_host=compute-0-5/2+compute-0-5/1+<a href="http://cluster.hpc.org/2+cluster.hpc.org/1" target="_blank">cluster.hpc.org/2+cluster.hpc.org/1</a><br> Resource_List.neednodes=2:ppn=2 Resource_List.nodect=2<br>
Resource_List.nodes=2:ppn=2 Resource_List.walltime=01:00:00 <b><br><br>...............................This sequence repeats many time as there is no </b>PBS_NODEFILE created. MPI is not able to get nodelist.<br>
<br><br>With regards,<br>Alap<br><br><br></div></div>
_______________________________________________<br>torqueusers mailing list<br><a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br><a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</blockquote></div><br><div>
<span style="border-collapse:separate;color:rgb(0, 0, 0);font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="border-collapse:separate;color:rgb(0, 0, 0);font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div style="word-wrap:break-word">
<span style="border-collapse:separate;color:rgb(0, 0, 0);font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div style="word-wrap:break-word">
<span style="border-collapse:separate;color:rgb(0, 0, 0);font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div style="word-wrap:break-word">
<span style="border-collapse:separate;color:rgb(0, 0, 0);font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-size:12px"><div>
<br>---------------------------------------------------------------------------------------</div><div>Si Hammond</div><div><br></div><div>Research & Knowledge Transfer Associate</div><div>Performance Modelling, Analysis and Optimisation Team</div>
<div>High Performance Systems Group</div><div>Department of Computer Science</div><div>University of Warwick, CV4 7AL, UK</div><div><a href="http://go.warwick.ac.uk/hpsg" target="_blank">http://go.warwick.ac.uk/hpsg</a></div>
<div>----------------------------------------------------------------------------------------</div><div><br></div></span></span></div></span></div></span></div></span></span><br>
</div>
<br></div></div><br>_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br></blockquote></div><br><br clear="all"><br>-- <br>Abraham Zamudio Ch.<br><br>
</div>