Skip to content

Commit

Permalink
job-exec: send SIGUSR1 to the IMP, not SIGKILL
Browse files Browse the repository at this point in the history
Problem: RFC 15 states that the IMP handles SIGUSR1 by
sending SIGKILL to the entire cgroup.

For multi-user, send the IMP SIGUSR1 rather than SIGKILL after
shell signaling mechanisms have failed to clean up.

Update test faux imp shell script used in test.
  • Loading branch information
garlick committed Nov 4, 2024
1 parent f972fe8 commit f0079bd
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 13 deletions.
14 changes: 11 additions & 3 deletions src/modules/job-exec/job-exec.c
Original file line number Diff line number Diff line change
Expand Up @@ -438,13 +438,21 @@ static void kill_shell_timer_cb (flux_reactor_t *r,
{
struct jobinfo *job = arg;
struct idset *active_ranks;
int actual_kill_signal = kill_signal;

/* RFC 15 states that the IMP handles SIGUSR1 by sending SIGKILL to
* the entire cgroup. Sending SIGKILL to the IMP is not productive.
*/
if (job->multiuser)
actual_kill_signal = SIGUSR1;

flux_log (job->h,
LOG_DEBUG,
"Sending %s to job shell for job %s",
sigutil_signame (kill_signal),
"Sending %s to %s for job %s",
sigutil_signame (actual_kill_signal),
job->multiuser ? "IMP" : "job shell",
idf58 (job->id));
(*job->impl->kill) (job, kill_signal);
(*job->impl->kill) (job, actual_kill_signal);
job->kill_shell_count++;

/* Since we've transitioned to killing the shell directly, stop the
Expand Down
10 changes: 0 additions & 10 deletions t/job-exec/imp-fail.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,6 @@ case "$cmd" in
printf "test-imp: Going to fail on rank 1\n" >&2
if test $(flux getattr rank) = 1; then exit 0; fi
exec "$@" ;;
kill)
# Note: kill must be implemented in test since job-exec
# module will run `flux-imp kill PID`.
#
signal=$2;
pid=$3;
printf "test-imp: kill -$signal $pid\n" >&2
shift 3;
printf "test-imp: Kill pid $pid signal $signal\n" >&2
kill -$signal $pid ;;
*)
printf "test-imp: Fatal: Unknown cmd=$cmd\n" >&2; exit 1 ;;
esac
Expand Down

0 comments on commit f0079bd

Please sign in to comment.