[torqueusers] How to suspend a job ?
Mon, 19 Apr 2004 14:48:41 +0200
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Sébastien Georget wrote:
> I would like to know if there an easy way to suspend an mpi job using
> I see two problems :
> 1/ send a signal to the 'master'
> 2/ send a signal to the 'slaves'
> 1/ I tried to use 'qsig -s SIGTSTP myjob'. The signal is sent to the
> mom, then forwarded to the shell used to start the pbs script.
> The signal stops here, it seems that it is not send to the children of
> the pbs script, is this the normal behaviour ? how to suspend an
> application if that is the case ?
It seems that there was a problem during my first tests. The signal is
correctly sent to all children.
> 2/ Will 'mpiexec' send the signal to all the host involded in the mpi
> run ? Are there solutions to suspend all mpi process and not only the
> master ?
By default mpiexec doesn't catch the SIGTSTP signal. What I have done is
to write a small patch to catch the SIGTSTP signal and send a SIGSTOP to
each mpi process.
Users can now suspend their job manually but they are still marked as
running in the qstat output.
It seems that a job is marked as suspended when signaled with the
'suspend' signal (qsig -s suspend jobid) but suspend = SIGSTOP.
Is it possible to change 'suspend' to SIGTSTP when the user is running a
parallel job and leave it as SIGSTOP for sequential jobs ?
INRIA Sophia-Antipolis, Service DREAM, B.P. 93
06902 Sophia-Antipolis Cedex, FRANCE
E-mail : firstname.lastname@example.org
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
-----END PGP SIGNATURE-----