<div dir="ltr">Hi,<br>3 days ago, the controller node(node1) of our cluster was down by unknown reason, and i had to restart it. <br>The queue jobs was still hold after restart, and the running jobs also is still running. <br>
But when a job is completed that can be sure by the output files is still exist in the queue.<br>This job's state is marked "E", but this state was hold to now since yesterday.<br>A error message showed "<b>qdel: Request invalid for state of job MSG=invalid state for job - EXITING 3583.node1</b>" when i deleted by using the command "qdel jobid".<br>
The other problem is the output of command "pbsnodes -a", the state of half cluster nodes is "down,job-exclusive", but actually these nodes is not down.<br>It was useless when i modified the state of these nodes by qmgr "set node nodeid state = job-exclusive", because still jobs running in these nodes.<br>
I think these two problems are related.<br>How can i do?<br>Thanks<br clear="all"><br>-- <br>Best Wishes<br>ChenWeiguang<br>
</div>