<span style="font-family: verdana;">New datapoint - I ran the job with a 2 minute sleep, and found the job running only on n04, as <span style="font-family: courier new,monospace;">qstat -f </span>said it would be.<br><br>
Why wouldn't qsub honor my local node list?<br><br>dave<br style="font-family: verdana;"></span><br style="font-family: verdana;"><div><span class="gmail_quote">On 12/6/06, <b class="gmail_sendername">dave first</b> <<a href="mailto:linux4dave@gmail.com">
linux4dave@gmail.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><span style="font-family: verdana;">I am such a newbie that I squeek. I hope this is the correct forum in which to ask this question.
</span><br style="font-family: verdana;"><br style="font-family: verdana;"><span style="font-family: verdana;">
I want to specify a nodelist other than that which would be $PBS_NODEFILE. I want to specify n10, n11, n12 and n13, each with 4 processors. The node list looks something like this:</span><br style="font-family: verdana;">
<br><span style="font-family: courier new,monospace;">n10:4</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">n11:4</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">n12:4</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">n13:4</span><br><br><span style="font-family: verdana;">And it is called local_nodelist in the working directory.
</span><br><br>The script sets <span style="font-family: courier new,monospace;">PBS_NODEFILE=`pwd`/local_nodelist</span><br><br><span style="font-family: verdana;">qstat -f while running the script elicits what seems to be an erroneous nodelist
</span><span style="font-family: courier new,monospace;"><br><br><span style="font-family: courier new,monospace;">Job Id: 76.excalibur<br> Job_Name = pbs_mpich.<br> Job_Owner = <a href="mailto:joeb@excalibur.example.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
joeb@excalibur.example.com</a><br> resources_used.cput = 00:00:00<br> resources_used.mem = 4296kb<br> resources_used.vmem = 175988kb<br> resources_used.walltime = 00:00:12<br> job_state = R<br> queue = default
<br> server = </span></span><span style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"><a href="http://excalibur.example.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
excalibur.example.com</a></span></span><br><span style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> Checkpoint = u<br> ctime = Wed Dec 6 08:54:16 2006<br> Error_Path = </span></span><span style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">
<a href="http://excalibur.example.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">excalibur.example.com</a></span></span><span style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;">
:/home/joeb/pbs_mpich..e76<br> <span style="color: rgb(204, 0, 0); font-weight: bold;">
exec_host = n04/0</span><br> Hold_Types = n<br> Join_Path = n<br> Keep_Files = n<br> Mail_Points = a<br> mtime = Wed Dec 6 08:54:17 2006<br> Output_Path = </span></span><span style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"><a href="http://excalibur.example.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">excalibur.example.com</a></span></span><span style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">
:/home/joeb/pbs_mpich..o76<br> Priority = 0<br> qtime = Wed Dec 6 08:54:16 2006<br> Rerunable = True<br> Resource_List.nodect = 1<br> Resource_List.nodes = 1<br> session_id = 31725<br> Variable_List = PBS_O_HOME=/home/joeb,PBS_O_LANG=en_US.UTF-8,
<br> PBS_O_LOGNAME=joeb,<br> PBS_O_PATH=/opt/torque/bin:/opt/bin:/opt/hdfview/bin:/opt/hdf/bin:/opt<br> /ncarg/bin:/opt/mpich/p4-gnu/bin:/opt/mpiexec//bin:/usr/kerberos/bin:/o<br> pt/java/jdk1.5.0/bin:/usr/lib64/ccache/bin:/usr/local/bin:/bin:/usr/bin
<br> :/usr/X11R6/bin:/opt/java/jdk1.5.0/jre/bin:/opt/visit/bin:/home/joeb/bi<br> n:/opt/mpich/p4-gnu/sbin,PBS_O_MAIL=/var/spool/mail/joeb<br> PBS_O_SHELL=/bin/bash,PBS_O_HOST=</span></span><span style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"><a href="http://excalibur.example.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">excalibur.example.com</a></span></span><span style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">
,<br> PBS_O_WORKDIR=/home/joeb,PBS_O_QUEUE=default<br> comment = Job started on Wed Dec 06 at 08:54<br> etime = Wed Dec 6 08:54:16 2006<br></span>---------------------------------------------------------------------------------
<br><br><span style="font-family: verdana;">However, the script output looks like this:<br><br><span style="font-family: courier new,monospace;">Job ID: <a href="http://76.excalibur.example.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
76.excalibur.example.com</a>
<br>Working directory is /home/joeb<br>Running on host <a href="http://n04.example.com" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">n04.example.com</a><br>Time is Wed Dec 6 08:54:17 PST 2006<br>
Directory is /home/joeb<br>The node file is /net/fs/home/joeb/local_nodefile
<br><span style="font-weight: bold; color: rgb(204, 0, 0);">This job runs on the following processors:</span><br style="font-weight: bold; color: rgb(204, 0, 0);"><span style="font-weight: bold; color: rgb(204, 0, 0);"><a href="http://n09.example.com:4" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
n09.example.com:4</a> <a href="http://n10.example.com:4" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">n10.example.com:4</a> <a href="http://n11.example.com:4" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">
n11.example.com:4</a> <a href="http://n12.example.com:4" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)">n12.example.com:4</a></span><br style="font-weight: bold; color: rgb(204, 0, 0);">
<span style="font-weight: bold; color: rgb(204, 0, 0);">This job has allocated 4 nodes/processors.</span><br><br>/usr/local/bin/mpich/x86_64/p4/gnu/bin/mpirun -nolocal -np 4 -machinefile /net/fs/home/joeb/local_nodefile /usr/local/bin/mpich/p
<br>4-gnu/examples/cpi<br><br>pi is approximately 3.1416009869231249, Error is 0.0000083333333318<br>wall clock time = 0.003906<br></span></span></span><span style="font-family: courier new,monospace;">---------------------------------------------------------------------------------
</span><br><span style="font-family: courier new,monospace;"><span style="font-family: verdana;"><span style="font-family: courier new,monospace;"><br><span style="font-family: verdana;">Can anyone explain why the output of qstat -f and the script echo statements differ, and how can I determine which is correct? (Short of sleeping for a while while I look for all the processes?)
<br><br>Thanks,<br>dave<br></span><br></span></span></span>
</blockquote></div><br>