[Mauiusers] Exiting job hung?
Garrick Staples
garrick at clusterresources.com
Wed Jul 5 10:20:01 MDT 2006
On Wed, Jul 05, 2006 at 11:43:45AM -0400, Paul Van Allsburg alleged:
> I have a job that was started from node4, and node4 has gone off line
> with disk errors. The job ran on node12 and wants to write the final
> output file via node4, but that node is unavailable and the job sits in
> exiting state. The cluster is running torque-1.2.0p2 and maui-3.2.6p11.
>
> Job id Name User Time Use S Queue
> ---------------- ---------------- ---------------- -------- - -----
> 4351.curie amberDNA_md11 hinkle 54:03:16 E long
>
> Qdel fails and -p option is not available in this release..
>
> [root at curie ~]# qdel 4351
> qdel: Request invalid for state of job 4351.curie.chem.hope.edu
> [root at curie ~]# qdel -p 4351
> qdel: invalid option -- p
> usage: qdel [-W delay] job_identifier...
>
> I tried canceljob ...
>
> [root at curie ~]# canceljob 4351
> ERROR: cannot cancel job '4351'
>
>
> How can I force this job out of the queues?
Upgrade TORQUE use qdel -p :)
With your old TORQUE, the only solution is rather blunt... Stop
pbs_server, delete the job files in $PBS_SERVER_HOME/server_priv/jobs
and start pbs_server.
More information about the mauiusers
mailing list