[torqueusers] killing over limit jobs is unfriendly to mpiexec
pw at osc.edu
Thu Nov 23 13:48:30 MST 2006
I'm trying to figure out why mpiexec isn't catching the exit
statuses of the tasks when a job goes over a limit, like walltime.
Mom sends SIGTERM to all the processes in the job. Mpiexec catches
the signal, sends tm_kill() to all tasks and waits for them to exit.
The top-level shell, meanwhile, does not catch the signal, and
exits. This triggers code in scan_for_terminated to mark the task
TI_STATE_EXITED and to send another SIGTERM to all the remaining
Mpiexec catches this second SIGTERM and just exits, abandoning any
tasks. The thought was that when users hit ctrl-c, it tries to
clean tasks up nicely, but if the batch system has hosed itself, a
second tap of ctrl-c will force mpiexec to exit. If I were to
ignore future SIGTERMs, users would have to hit ctrl-z, then "kill
-9" the process to get it to go away.
However, I can hack/fix mpiexec to keep waiting across the second
SIGTERM, but it still does not get the proper TM obit messages,
because mom's scan_for_exiting() sets ptask->ti_fd to -1. This
causes task_check() to complain "cannot tm_reply to task 1" rather
than send the TM message. Commenting out that set of ti_fd does
not change the behavior, because kill_task() sits in a tight loop
for 4 sec waiting for the task to die rather than delivering the
queued up obits. Eventually everything dies with SIGKILL.
Everything does work nicely, though, if I ignore the SIGTERM in the
trap "echo Job shell caught TERM, ignoring >&2" TERM
Works brilliantly, unmodified. But I'd hate to force users to do
this to get the right behavior.
Any ideas how to fix this in torque? That loop in kill_task() is
new compared to good-old PBS. I'm fishing for thoughts at this
point. The behavior can always be papered over by not reporting
exit values when they are missing, but a clean solution would be
More information about the torqueusers