-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
imp: exec: send SIGKILL to cgroup/children on SIGUSR1 #186
Conversation
Hm, readthedocs build seems to have been broken at some point as well. Will have to look into that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
I have some comments but all superficial and/or optional.
As part of this work I switched the cgroup controller that the IMP checks for from I think the correct thing (after perusing cgroups(7)) is to use the I'll have to modify this PR to make that change. |
Problem: Signal forwarding is currently implemented directly in the IMP exec subcommand which makes it impossible to reuse the code in other IMP subcommands. Move signal handling to imp/signals.[ch] for better code reuse. Pass the imp state object into the signal handler for possible future use, otherwise behavior of the IMP exec command should be identical to before this change.
22462f7
to
4668b2c
Compare
Problem: To comply with RFC 15, the IMP needs to know attributes of its current cgroup, but this information is not currently available. Add a cgroup_info structure to the imp_state object, and populate it at IMP startup. This moves and modifies some internal code from pidinfo.c. In the future, this duplicate code in pidinfo.c will be removed.
Ok, I think I've fixed the cgroup discovery here. The code is somewhat simplified, at startup the IMP:
I also had to add a workaround to remove leading |
Problem: RFC 15 specifies that the IMP should handle SIGUSR1 as a surrogate for SIGKILL, but this is not currently supported. Add cgroup_kill() which will send a signal to all member's of the IMP's cgroup (besides the IMP's own pid). Add SIGUSR1 to the set of signals handled by the signal forwarding implementation of the IMP. In the case of SIGUSR1, deliver SIGKILL to all members of the current cgroup if use_cgroup_kill is true, otherwise fallback to pid_kill_children().
Problem: The lingering IMP exits with 127+signo if the invoked child was terminated by a signal during `flux-imp exec`, but ideally the IMP would exit with the exact wait status of the child. Add a convenience function, `imp_raise()`, which raises a signal after setting the handler to the default disposition for the signal in question. Use imp_raise() to deliver the same signal which terminated the child in `flux-imp exec` if the child is detected to have exited due to a signal.
Problem: Some tests in the flux-security testsuite may need write access to cgroups, but docker-run-checks.sh explicitly mounts cgroups as read-only. Mount cgroups read-write for testing purposes.
Problem: The IMP exec tests that use sign-none.toml and sudo are unnecessarily complicated because the IMP has to be run in a subshell, sometimes in the background. This not only duplicates a lot of code, but makes it impossible to get the exit code of the IMP when it is run in the background. Add a function to run `flux-imp exec` under sudo with the sign-none.toml config. While we're at it, create the `sleeper.sh` helper at the beginning of the tests, instead of as part of another test. Update callers to use `imp_exec_sign_none()` where possible.
Problem: There are no tests the ensure sending SIGUSR1 to the IMP causes it to send SIGKILL to its children. Add a couple tests that ensure SIGKILL is sent when the IMP is running both in and out of an "imp-shell" cgroup.
Problem: pstree(1) is not available on all test platforms. Remove its use in debugging tests.
Problem: The readthedocs build fails with sphinx.errors.VersionRequirementError: 5.0 Update doc/requirements.txt to match those in flux-core, including an updated sphinx version requirement.
Problem: Some distros don't seem to have CGROUP2_SUPER_MAGIC defined in linux/magic.h. Copy the magic value from a system that has it. Remove the ifndef CGROUP2_SUPER_MAGIC block since it will now always be defined.
Problem: Relative paths for cgroups in /proc/self/cgroup may contain leading `/..` when the IMP is run inside some containers. This is because the path is relative to the container's mount point, not the actual mount point of the cgroupfs. Just strip the leading /.. from paths during discovery of the current cgroup path from `/proc/self/cgroup`. This should only apply in containers and thus CI.
I grabbed your snapshot from last night and put it on my test cluster. This morning I made some minimal changes to job-exec on my
On the compute node, systemd logged
|
This morning I added one fix here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Great, thanks! I'll set MWP. I'll post a PR to have the IMP wait until its cgroup is empty in a bit. |
This PR updates the IMP to translate SIGUSR1 to SIGKILL for
flux-imp exec
as defined in RFC 15.Additionally, if possible, the IMP will signal all members of its cgroup (besides itself) when delivering SIGKILL.
Finally, to better handle returning the proper wait status to its parent, when the child of the IMP is terminated by a signal, the IMP raises that same signal to itself.
This is WIP pending real system testing.