[Mauiusers] Exiting job hung?

Garrick Staples garrick at clusterresources.com
Wed Jul 5 10:20:01 MDT 2006


On Wed, Jul 05, 2006 at 11:43:45AM -0400, Paul Van Allsburg alleged:
> I have a job that was started from  node4, and node4 has gone off line 
> with disk errors.  The job ran on node12 and wants to write the final 
> output file via node4, but that node is unavailable and the job sits in 
> exiting state.  The cluster is running  torque-1.2.0p2 and maui-3.2.6p11.
> 
> Job id           Name             User             Time Use S Queue
> ---------------- ---------------- ---------------- -------- - -----
> 4351.curie       amberDNA_md11    hinkle           54:03:16 E long
> 
> Qdel fails and -p option is not available in this release..
> 
> [root at curie ~]# qdel 4351
> qdel: Request invalid for state of job 4351.curie.chem.hope.edu
> [root at curie ~]# qdel -p 4351
> qdel: invalid option -- p
> usage: qdel [-W delay] job_identifier...
> 
> I tried canceljob ...
> 
> [root at curie ~]# canceljob 4351
> ERROR:  cannot cancel job '4351'
> 
> 
> How can I force this job out of the queues?

Upgrade TORQUE use qdel -p :)

With your old TORQUE, the only solution is rather blunt...  Stop
pbs_server, delete the job files in $PBS_SERVER_HOME/server_priv/jobs
and start pbs_server.



More information about the mauiusers mailing list