<div dir="ltr"><br><br><div class="gmail_quote">On Mon, Aug 4, 2008 at 11:50 AM, Stijn De Weirdt <span dir="ltr"><<a href="mailto:Stijn.DeWeirdt@ugent.be">Stijn.DeWeirdt@ugent.be</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
hi all,<br>
<br>
we are playing around with torque and blcr, and one of the things we are trying is placing a job in hold, and trying to restart it on an other node. (yes, i know it's not officially supported ;)<br>
<br>
we are using blcr 071 with torque 2.4.0-snap.200807091010<br>
a "simple" qhold and qrls work, but when we flag the processing node offline, a qrls keeps the job in state queued (and i don't find an obvious qalter option).<br>
<br>
checkjob says<br>
...<br>
job is deferred. Reason: RMFailure (cannot start job - RM failure, rc: 15057, msg: 'Cannot execute at specified host because of checkpoint or stagein files REJHOST=node11-2.somedomain MSG=cannot allocate node 'node11-2.somedomain' to job - node not currently available (state: offline)') Holds: Defer (hold reason: RMFailure)<br>
...<br>
<br>
<br>
so my question is:<br>
is this supposed to be working (and if not, is it planned)?</blockquote><div><br>nope, yes <br> </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<br>
and is this possible for mpi jobs (ie relocation of the processes) (i'm going to guess not, but i kindof hope i'm wrong ;)</blockquote></div><br><br>not right now. I'd like to work with the OpenMPI folks to get TORQUE aware of BLCR-enabled OpenMPI<br>
</div>