[Mauiusers] Jobs in Queue Forever
Lennart Karlsson
Lennart.Karlsson at nsc.liu.se
Fri Nov 12 04:18:52 MST 2004
The 4th of November, Gabe Turner wrote:
> Also, if you don't want to bother looking at when the job was asking for,
> you can remove neednodes entirely by passing it no value:
>
> qalter -l neednodes= 503
>
> Unfortunately, I have this problem in PBSPro 5.4.1 and have always had this
> problem. It does make sense to leave neednodes set when a node goes down,
> however, since it will ensure that those jobs get run at that node as soon
> as it comes back. However, this assumes that the node will come back soon,
> i.e. that it wasn't a hardware failure that brought it down. Unfortunately
> for me, I'm almost never in the situation that I can bring the node back up
> promptly so I have to manually go through all the jobs that were on the
> node and unset neednodes :\
I have the same problem on our PBSPro cluster, and I wrote a perl script, run
by cron every 20 minutes, that does the 'qalter'. (And I did configure
Maui to defer jobs for half an hour.) My script does also a few other checks
and actions, matching our local policies and environment, making it unfit to
use on other clusters, but the central check is to compare
"Resource_List.neednodes" with "Resource_List.nodes" in the "qstat -f" output
for all jobs in the "job_state" called "Q".
Stupid solutions to stupid problems... ;-)
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
System Expert at National Supercomputer Centre in Linkoping, Sweden
http://www.nsc.liu.se
+46 706 49 55 35
+46 13 28 26 24
More information about the mauiusers
mailing list