Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdexec: set KillMode=process SendSIGKILL=no for multi-user jobs #6402

Merged
merged 4 commits into from
Oct 31, 2024

Conversation

garlick
Copy link
Member

@garlick garlick commented Oct 31, 2024

Problem: For multi-user jobs spawned via SDEXEC, the systemd user instance running as the flux user does not have permission to kill guest processes, yet it does try and in the process may kill off the only process that does have permission to continue cleanup efforts, the IMP.

When the job is run by the IMP and sdexec, Set KillMode=process so that systemd only delivers signals to the IMP, which it should forward to the shell and/or cgroup per RFC 15.

Also set SendSIGKILL=off so that SIGKILL is never deployed against the IMP.

Fixes #6399

This should be OK even without the forthcoming changes to job-exec and the IMP to forward signals and avoid using SIGKILL, however I still need to give this a sanity check on my test cluster as the conditional use of these properties for IMP jobs is not tested by CI. I think we can still get a review going however so I'll not set WIP.

Problem: the sdexec code references 'flux imp kill' in an expanatory
comment, but we plan to remove the kill subcommand from the IMP and
have it forward signals to the cgroup.

Drop that comment.
Problem: sdexec_start_transient_unit() does not support the SendSIGKILL
attribute, which may need to be set to "off" for Flux guest jobs.

Add support for the SendSIGKILL atttribute.

To use, set the subprocess command option "SDEXEC_PROP_SendSIGKILL"
to "off" (or other valid systemd boolean config value).
Problem: there is no test coverage that confirms sdexec
can set KillMode and SendSIGKILL.

Add some tests.
Problem: For multi-user jobs spawned via SDEXEC, the systemd user
instance running as the flux user does not have permission to kill
guest processes, yet it does try and in the process may kill off
the only process that does have permission to continue cleanup efforts,
the IMP.

When the job is run by the IMP and sdexec, Set KillMode=process so that
systemd only delivers signals to the IMP, which it should forward
to the shell and/or cgroup per RFC 15.

Also set SendSIGKILL to "off" so that SIGKILL is never deployed against
the IMP.

Fixes flux-framework#6399
Copy link
Contributor

@grondo grondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@garlick
Copy link
Member Author

garlick commented Oct 31, 2024

I ran a little test and did verify that the properties are set as expected when running a job a guest and when running a job as the instance owner (where the IMP is not used):

As a guest:

 garlick@picl0:~$ flux submit -q batch -N1 sleep 3600
ƒ3L563TwyxT
 garlick@picl0:~$ flux jobs
       JOBID QUEUE    USER     NAME       ST NTASKS NNODES     TIME INFO
 ƒ3L563TwyxT batch    garlick  sleep       R      1      1   2.334s picl1

 garlick@picl0:~$ sudo flux exec -r 1 systemctl --user status
● picl1
    State: running
    Units: 100 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Thu 2024-10-31 11:43:16 PDT; 15min ago
  systemd: 252.30-1~deb12u2
   CGroup: /user.slice/user-500.slice/[email protected]
           ├─app.slice
           │ └─imp-shell-1-f3L563TwyxT.service
           │   ├─869 /usr/lib/aarch64-linux-gnu/flux/flux-imp exec /usr/lib/aarch64-linux-gnu/flux/flux-shell 17297539929735168
           │   ├─873 /usr/lib/aarch64-linux-gnu/flux/flux-shell 17297539929735168
           │   └─874 sleep 3600
           ├─init.scope
           │ ├─803 /lib/systemd/systemd --user
           │ └─804 "(sd-pam)"
           └─session.slice
             └─dbus.service
               └─835 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
 garlick@picl0:~$ sudo flux exec -r 1 systemctl --user show --property SendSIGKILL imp-shell-1-f3L563TwyxT
SendSIGKILL=no
 garlick@picl0:~$ sudo flux exec -r 1 systemctl --user show --property KillMode imp-shell-1-f3L563TwyxT
KillMode=process

As instance owner:

 garlick@picl0:~$ sudo -u flux flux submit -q batch -N1 sleep 3600
 garlick@picl0:~$ flux jobs -A
       JOBID QUEUE    USER     NAME       ST NTASKS NNODES     TIME INFO
 ƒ3L6VaRh81d batch    flux     sleep       R      1      1   10.06s picl1

 garlick@picl0:~$ sudo flux exec -r 1 systemctl --user status
● picl1
    State: running
    Units: 100 loaded (incl. loaded aliases)
     Jobs: 0 queued
   Failed: 0 units
    Since: Thu 2024-10-31 11:43:16 PDT; 18min ago
  systemd: 252.30-1~deb12u2
   CGroup: /user.slice/user-500.slice/[email protected]
           ├─app.slice
           │ └─shell-1-f3L6VaRh81d.service
           │   ├─923 /usr/lib/aarch64-linux-gnu/flux/flux-shell 17300643815358464
           │   └─924 sleep 3600
           ├─init.scope
           │ ├─803 /lib/systemd/systemd --user
           │ └─804 "(sd-pam)"
           └─session.slice
             └─dbus.service
               └─835 /usr/bin/dbus-daemon --session --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
 garlick@picl0:~$ sudo flux exec -r 1 systemctl --user show --property KillMode  shell-1-f3L6VaRh81d
KillMode=control-group
 garlick@picl0:~$ sudo flux exec -r 1 systemctl --user show --property SendSIGKILL shell-1-f3L6VaRh81d
SendSIGKILL=yes

@garlick
Copy link
Member Author

garlick commented Oct 31, 2024

I'll go ahead and set MWP on this one.

@mergify mergify bot merged commit 98a4348 into flux-framework:master Oct 31, 2024
32 checks passed
@garlick garlick deleted the issue#6399 branch October 31, 2024 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sdexec: systemd tries to kill guest tasks without sufficient permission
2 participants