I often find myself in situations, in which jobs should have enough resources and <br>should be running. I submit jobs using PBS script. Nevertheless, if the job gets<br>hung in queue for a longer time I try force them to run using "runjob" or "qrun". It<br>
usually works provided that there are enough free resources available. <br><br>Jozef<br><br><div class="gmail_quote">2008/4/16 <<a href="mailto:pat.o%27bryant@exxonmobil.com">pat.o'bryant@exxonmobil.com</a>>:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
Zhyang,<br>
Here is something you might try. Code up a Torque "job_script" with the<br>
following "#PBS" control cards. Note that "#PBS" control cards can take the<br>
place of command line arguments and they follow the same format. Submit<br>
the job using "qsub job_script". If you specify ppn > (number of<br>
cpus/node), Maui (for some paramter settings) will look for a matching<br>
node with that number of cpus minimum. So for example, if you use "#PBS -l<br>
nodes=8:ppn=4", Maui will look for nodes with 4 cpus. If it can't find a<br>
node like that, the job will remain queued. The thing to keep in mind is<br>
that Torque queues your job and Maui (in your case) actually decides where<br>
and when your job will execute. Most execution problems will be due to<br>
Maui/Moab parameter settings. Here are some links to check as well:<br>
<br>
<a href="http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission" target="_blank">http://www.clusterresources.com/wiki/doku.php?id=torque:2.1_job_submission</a><br>
<a href="http://www.clusterresources.com/products/mwm/docs/a.fparameters.shtml" target="_blank">http://www.clusterresources.com/products/mwm/docs/a.fparameters.shtml</a><br>
<br>
Contents of "job_script"<br>
----------------------------------<br>
#!/bin/bash<br>
#PBS -N Short<br>
#PBS -l nodes=8:ppn=2,walltime=00:02:00<br>
pwd<br>
hostname<br>
<br>
End of "job_script"<br>
---------------------------<br>
<div class="Ih2E3d"><br>
Thanks,<br>
Pat<br>
<br>
J.W. (Pat) O'Bryant,Jr.<br>
Business Line Infrastructure<br>
Technical Systems, HPC<br>
Office: 713-431-7022<br>
<br>
<br>
<br>
<br>
</div> <a href="mailto:zhyang@lzu.edu">zhyang@lzu.edu</a><br>
.cn<br>
To<br>
<div class="Ih2E3d"> <a href="mailto:pat.o%27bryant@exxonmobil.com">pat.o'bryant@exxonmobil.com</a><br>
</div> 04/15/08 07:19 cc<br>
AM <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
Subject<br>
Re: Re: [torqueusers] have enough<br>
<div><div></div><div class="Wj3C7c"> nodes,but job is not running<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
<br>
Hi pat<br>
<br>
I am not use the pbs control cards. I have 56 nodes, 2 cpu per node.<br>
<br>
<br>
>-----原始邮件-----<br>
> 发件人: <a href="mailto:pat.o%27bryant@exxonmobil.com">pat.o'bryant@exxonmobil.com</a><br>
> 发送时间: 2008-04-15 20:09:27<br>
> 收件人: <a href="mailto:zhyang@lzu.edu.cn">zhyang@lzu.edu.cn</a><br>
> 抄送:<br>
> 主题: Re: [torqueusers] have enough nodes,but job is not running<br>
> Zhyang,<br>
><br>
> What do your #PBS control cards look like? Also, how many cpus/node<br>
do<br>
><br>
> you have?<br>
><br>
> Thanks,<br>
><br>
> Pat<br>
><br>
><br>
><br>
><br>
><br>
> J.W. (Pat) O'Bryant,Jr.<br>
><br>
> Business Line Infrastructure<br>
><br>
> Technical Systems, HPC<br>
><br>
> Office: 713-431-7022<br>
><br>
><br>
><br>
><br>
><br>
><br>
> Hi<br>
><br>
> I have a cluster include 56 nodes, and install torque and maui, but<br>
><br>
> recently I found that when I use showq show 34 nodes active, user submit<br>
5<br>
><br>
> nodes job, the job status is Q and not running,from showq result ,it<br>
should<br>
><br>
> have enough nodes(at leaat 5 nodes),but why the job not running?<br>
><br>
> I submit 2 nodes job ,job running is ok. who can help me ? Thanks!<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> --<br>
><br>
> _______________________________________________<br>
><br>
> torqueusers mailing list<br>
><br>
> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
><br>
> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
><br>
><br>
><br>
><br>
><br>
><br>
<br>
-- 此致<br>
敬礼<br>
张洋<br>
兰州大学通信网络中心<br>
地址:中国甘肃兰州天水路222号<br>
电话:(0931)8912011 传真:(0931)8912022 邮<br>
编:730000 Email:<a href="mailto:zhyang@lzu.edu.cn">zhyang@lzu.edu.cn</a><br>
</div></div><br>_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br></blockquote></div><br>