<br><br><div class="gmail_quote">On Thu, Jun 12, 2008 at 12:49 PM, Al Taufer <<a href="mailto:ataufer@clusterresources.com">ataufer@clusterresources.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
It seems that the code is returning an error message when it should not be returning one.<br>
<br>
The documentation says that for a running job if checkpoint / restart is not supported, qhold will only set the requested hold attribute. This will have no effect unless the job is rerun with the qrerun command.</blockquote>
<div><br>I know this is the case in the 2.4 snapshots. The hold does get set, and there is no error message displayed by qhold. Pre 2.4 torque versions complain that the job can't be checkpointed and don't set the hold. Which version of the documentation says the hold will be set even if the job can't be checkpointed? <br>
<br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>
<br>
You should be able to verify that the hold is still being placed on the job by using 'qstat -f' and checking the Hold_Types value.<br>
<br>
Al<br>
<br>
Walid wrote:<br>
<blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div><div></div><div class="Wj3C7c">
Hi All,<br>
<br>
I have installed toruqe 2.3.0 with maui, however i find that i am having a different behaviour when i am trying to hold jobs, qhold complains that the request is rejected, when i check the momlogs it mentions check pointing not support, i am not interested in check pointing, however i would like to have the ability to restart the jobs, any pointers would be appreciated<br>
<br>
regards<br>
<br>
Walid<br>
<br>
[root@lnx ~]# qstat -an<br>
lnx: Req'd Req'd Elap<br>
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time<br>
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----<br>
901.lnx luser parallel STDIN 5270 1 -- -- -- R --<br>
lnx512/0<br>
[root@lnx ~]# qhold 901<br>
qhold: No support for requested service MSG=MOM rejected hold request: 15029 901.lnx<br>
pbs_mom;Req;req_reject;Reject reply code=15029(No support for requested service REJHOST=lnx512 MSG=checkpointing not supported), aux=0, type=HoldJob, from PB<br>
S_Server@lnx<br>
<br></div></div>
------------------------------------------------------------------------<br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br>
</blockquote>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</blockquote></div><br>