Skip to content

Commit

Permalink
job-exec: send SIGUSR1 to the IMP, not SIGKILL
Browse files Browse the repository at this point in the history
Problem: RFC 15 states that the IMP handles SIGUSR1 by
sending SIGKILL to the entire cgroup.

For multi-user, send the IMP SIGUSR1 rather than SIGKILL after
shell signaling mechanisms have failed to clean up.
  • Loading branch information
garlick committed Nov 4, 2024
1 parent f972fe8 commit cde68f9
Showing 1 changed file with 11 additions and 3 deletions.
14 changes: 11 additions & 3 deletions src/modules/job-exec/job-exec.c
Original file line number Diff line number Diff line change
Expand Up @@ -438,13 +438,21 @@ static void kill_shell_timer_cb (flux_reactor_t *r,
{
struct jobinfo *job = arg;
struct idset *active_ranks;
int actual_kill_signal = kill_signal;

/* RFC 15 states that the IMP handles SIGUSR1 by sending SIGKILL to
* the entire cgroup. Sending SIGKILL to the IMP is not productive.
*/
if (job->multiuser)
actual_kill_signal = SIGUSR1;

flux_log (job->h,
LOG_DEBUG,
"Sending %s to job shell for job %s",
sigutil_signame (kill_signal),
"Sending %s to %s for job %s",
sigutil_signame (actual_kill_signal),
job->multiuser ? "IMP" : "job shell",
idf58 (job->id));
(*job->impl->kill) (job, kill_signal);
(*job->impl->kill) (job, actual_kill_signal);
job->kill_shell_count++;

/* Since we've transitioned to killing the shell directly, stop the
Expand Down

0 comments on commit cde68f9

Please sign in to comment.