Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add --listen-backlog option in munged #139

Merged
merged 2 commits into from
Mar 11, 2024

Conversation

meow-watermelon
Copy link
Contributor

This small PR adds an option --listen-backlog for the munged service if the user wants to override the default domain socket backlog value.

If no option is passed, then munged would use the default value that MUNGE_SOCKET_BACKLOG is defined.

Thanks.

Copy link
Owner

@dun dun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sensible option, and the code looks good. There's just a few minor changes I'd like to see. I'll run my tests tomorrow, but I don't expect I'll have any more changes.

src/munged/conf.c Outdated Show resolved Hide resolved
src/munged/conf.c Outdated Show resolved Hide resolved
src/munged/munged.c Outdated Show resolved Hide resolved
Copy link
Owner

@dun dun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@dun
Copy link
Owner

dun commented Feb 28, 2024

How are you triggering the problem, and what happens when it triggers? At what scale is this happening? I want to make sure I understand the problem you're solving with this because I observed some unexpected behavior in my testing.

During my tests, I was able to overrun the socket queue by setting a very small listen backlog and encoding with a large number of threads using remunge. That caused munge_encode() to occasionally fail with a "resource temporarily unavailable" (EAGAIN) error.

But what I was expecting to happen was for the while-loop in _m_msg_client_connect to retry the failed connect(). If a lot of these connects were failing but one eventually succeeded, then that encode would have taken longer than usual but it would have eventually succeeded. That's the problem I thought you were having.

I've pushed a commit to this branch that now retries the connect() for EAGAIN. Can you test this and let me know if it makes a difference with the length of the listen backlog that is needed?

@meow-watermelon
Copy link
Contributor Author

Hi @dun thanks a lot for your testing. The issue what I observed is the application slurmctld failed to connect to munged UNIX domain socket with EAGAIN error on message decoding. This issue happened during the cluster was very busy so to be honest I don't know how to trigger that. The reason why I raised this PR is because I noticed this EAGAIN issue and trying to find where I could adjust the listen backlog to prevent this issue and noticed that the listen backlog value was hardcoded in munged.

Another thing is I double-checked the connect() call manual:

       EINPROGRESS
              The  socket  is nonblocking and the connection cannot be completed immediately.  (UNIX domain sockets failed with EAGAIN instead.)

So I'm wondering the reason why the retry didn't work is that the UNIX domain socket does not return EINPROGRESS but return EAGAIN instead? Your latest commit captured this EAGAIN now.

Please let me know if anything is unclear. Thank you again for the testing.

@dun
Copy link
Owner

dun commented Mar 1, 2024

Yes, the reason the retry logic didn't work here was because EAGAIN was not handled as a transient error that should be retried. Stevens wrote in UNIX Network Programming Vol 1 that ECONNREFUSED would be returned when the listening socket's queue was full. But as we've seen, that is not the case on current Linux systems which return EAGAIN instead (and this is documented in the current manpages). I'm surprised this bug hasn't been reported before now, so thank you for bringing it to my attention.

When that retry code was originally written, the maximum backlog on Linux was at most 128. But SOMAXCONN increased to 4096 in Linux 5.4. With the retry code now handling EAGAIN, I don't think the listen backlog will need to be increased for most workloads. But your PR will be useful for anyone needing to increase it further, especially since Linux now supports a much larger backlog.

I have some more portability testing to do yet, but I should be merging this in the next few days. And I'm hoping to get a release out with these fixes soon afterwards.

@meow-watermelon
Copy link
Contributor Author

Thank you @dun ! I'm glad we addressed this issue.

meow-watermelon and others added 2 commits March 8, 2024 00:06
Add the "--listen-backlog" command-line option to specify the socket's
listen backlog limit.

On a very busy cluster, slurmctld failed to connect to munged to decode
a credential.  The returned error message was "Resource temporarily
unavailable" (EAGAIN).  Increasing the munged listen backlog prevented
this error from reoccurring.

PR dun#139
Signed-off-by: Chris Dunlap <[email protected]>
Change the behavior of the "--listen-backlog" command-line
option in order for a value of 0 to use the software default
(MUNGE_SOCKET_BACKLOG), and a value of -1 to specify SOMAXCONN.

Note that SOMAXCONN is now 4096 on current Linux systems; it was 128
before Linux 5.4 which meant that the munged socket had been using
the maximum backlog up until that point.

Reference:
- https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html

Tested:
- AlmaLinux 9.3, 8.9
- Arch Linux
- CentOS Linux Stream 9, Stream 8, 7.9.2009, 6.10
- Debian sid, 12.5, 11.9, 10.13, 9.13, 8.11, 7.11, 6.0.10, 5.0.10, 4.0
- Fedora 39, 38, 37
- FreeBSD 13.2
- NetBSD 9.3
- OpenBSD 7.4, 7.3
- openSUSE 15.5, 15.4
- Ubuntu 23.10, 22.04.4, 20.04.6, 18.04.6, 16.04.7, 14.04.6

PR dun#139
@dun dun added the feature label Mar 11, 2024
@dun dun added this to the 0.5.16 milestone Mar 11, 2024
@dun dun force-pushed the add_listen_backlog_opt branch from 3b0e5c3 to 0689a1f Compare March 11, 2024 17:44
@dun dun merged commit 36e17dc into dun:master Mar 11, 2024
dun added a commit that referenced this pull request Mar 11, 2024
libmunge retries transient errors when connecting to munged.
This should handle errors arising from the listening socket's queue
being full.  However, PR #139 uncovered a bug that did not handle
EAGAIN which is returned on Linux when a nonblocking UNIX domain
socket connection cannot be completed immediately.

This commit fixes the while-loop in _m_msg_client_connect() that
retries connect() so both EAGAIN (Linux) and ECONNREFUSED (BSD)
are handled as transient errors that should be retried.

This was tested by setting "--listen-backlog=1", running munged
with the default 2 work threads, and running remunge with 64
threads.  First, the while-loop was altered so connect() errors
would not be retried.  remunge could reproduce EAGAIN on Linux,
and this behavior was dramatically more reproducible if vcpu > 1.
remunge could reproduce ECONNREFUSED on NetBSD 9.3 with vcpu=1, and
on FreeBSD 14.0 with vcpu=2.  Adding back the retry logic for EAGAIN
and ECONNREFUSED made this connect() failure difficult to reproduce
even with "--listen-backlog=1".

Tested:
- AlmaLinux 9.3, 8.9
- Arch Linux
- CentOS Linux Stream 9, Stream 8, 7.9.2009, 6.10
- Debian sid, 12.5, 11.9, 10.13, 9.13, 8.11, 7.11, 6.0.10, 5.0.10, 4.0
- Fedora 39, 38, 37
- FreeBSD 14.0, 13.3, 13.2
- NetBSD 9.3
- OpenBSD 7.4, 7.3
- openSUSE 15.5, 15.4
- Ubuntu 23.10, 22.04.4, 20.04.6, 18.04.6, 16.04.7, 14.04.6
@dun
Copy link
Owner

dun commented Mar 11, 2024

@meow-watermelon Thanks again for your PR!

@meow-watermelon
Copy link
Contributor Author

Thanks @dun for the improvements and thorough testings on this change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants