add --listen-backlog option in munged #139

meow-watermelon · 2024-02-25T17:26:30Z

This small PR adds an option --listen-backlog for the munged service if the user wants to override the default domain socket backlog value.

If no option is passed, then munged would use the default value that MUNGE_SOCKET_BACKLOG is defined.

Thanks.

dun

This is a sensible option, and the code looks good. There's just a few minor changes I'd like to see. I'll run my tests tomorrow, but I don't expect I'll have any more changes.

src/munged/conf.c

src/munged/munged.c

dun

Looks good. Thanks!

dun · 2024-02-28T06:58:16Z

How are you triggering the problem, and what happens when it triggers? At what scale is this happening? I want to make sure I understand the problem you're solving with this because I observed some unexpected behavior in my testing.

During my tests, I was able to overrun the socket queue by setting a very small listen backlog and encoding with a large number of threads using remunge. That caused munge_encode() to occasionally fail with a "resource temporarily unavailable" (EAGAIN) error.

But what I was expecting to happen was for the while-loop in _m_msg_client_connect to retry the failed connect(). If a lot of these connects were failing but one eventually succeeded, then that encode would have taken longer than usual but it would have eventually succeeded. That's the problem I thought you were having.

I've pushed a commit to this branch that now retries the connect() for EAGAIN. Can you test this and let me know if it makes a difference with the length of the listen backlog that is needed?

meow-watermelon · 2024-02-28T18:49:41Z

Hi @dun thanks a lot for your testing. The issue what I observed is the application slurmctld failed to connect to munged UNIX domain socket with EAGAIN error on message decoding. This issue happened during the cluster was very busy so to be honest I don't know how to trigger that. The reason why I raised this PR is because I noticed this EAGAIN issue and trying to find where I could adjust the listen backlog to prevent this issue and noticed that the listen backlog value was hardcoded in munged.

Another thing is I double-checked the connect() call manual:

       EINPROGRESS
              The  socket  is nonblocking and the connection cannot be completed immediately.  (UNIX domain sockets failed with EAGAIN instead.)

So I'm wondering the reason why the retry didn't work is that the UNIX domain socket does not return EINPROGRESS but return EAGAIN instead? Your latest commit captured this EAGAIN now.

Please let me know if anything is unclear. Thank you again for the testing.

dun · 2024-03-01T07:01:18Z

Yes, the reason the retry logic didn't work here was because EAGAIN was not handled as a transient error that should be retried. Stevens wrote in UNIX Network Programming Vol 1 that ECONNREFUSED would be returned when the listening socket's queue was full. But as we've seen, that is not the case on current Linux systems which return EAGAIN instead (and this is documented in the current manpages). I'm surprised this bug hasn't been reported before now, so thank you for bringing it to my attention.

When that retry code was originally written, the maximum backlog on Linux was at most 128. But SOMAXCONN increased to 4096 in Linux 5.4. With the retry code now handling EAGAIN, I don't think the listen backlog will need to be increased for most workloads. But your PR will be useful for anyone needing to increase it further, especially since Linux now supports a much larger backlog.

I have some more portability testing to do yet, but I should be merging this in the next few days. And I'm hoping to get a release out with these fixes soon afterwards.

meow-watermelon · 2024-03-02T03:29:38Z

Thank you @dun ! I'm glad we addressed this issue.

Add the "--listen-backlog" command-line option to specify the socket's listen backlog limit. On a very busy cluster, slurmctld failed to connect to munged to decode a credential. The returned error message was "Resource temporarily unavailable" (EAGAIN). Increasing the munged listen backlog prevented this error from reoccurring. PR dun#139 Signed-off-by: Chris Dunlap <[email protected]>

Change the behavior of the "--listen-backlog" command-line option in order for a value of 0 to use the software default (MUNGE_SOCKET_BACKLOG), and a value of -1 to specify SOMAXCONN. Note that SOMAXCONN is now 4096 on current Linux systems; it was 128 before Linux 5.4 which meant that the munged socket had been using the maximum backlog up until that point. Reference: - https://www.kernel.org/doc/html/latest/networking/ip-sysctl.html Tested: - AlmaLinux 9.3, 8.9 - Arch Linux - CentOS Linux Stream 9, Stream 8, 7.9.2009, 6.10 - Debian sid, 12.5, 11.9, 10.13, 9.13, 8.11, 7.11, 6.0.10, 5.0.10, 4.0 - Fedora 39, 38, 37 - FreeBSD 13.2 - NetBSD 9.3 - OpenBSD 7.4, 7.3 - openSUSE 15.5, 15.4 - Ubuntu 23.10, 22.04.4, 20.04.6, 18.04.6, 16.04.7, 14.04.6 PR dun#139

libmunge retries transient errors when connecting to munged. This should handle errors arising from the listening socket's queue being full. However, PR #139 uncovered a bug that did not handle EAGAIN which is returned on Linux when a nonblocking UNIX domain socket connection cannot be completed immediately. This commit fixes the while-loop in _m_msg_client_connect() that retries connect() so both EAGAIN (Linux) and ECONNREFUSED (BSD) are handled as transient errors that should be retried. This was tested by setting "--listen-backlog=1", running munged with the default 2 work threads, and running remunge with 64 threads. First, the while-loop was altered so connect() errors would not be retried. remunge could reproduce EAGAIN on Linux, and this behavior was dramatically more reproducible if vcpu > 1. remunge could reproduce ECONNREFUSED on NetBSD 9.3 with vcpu=1, and on FreeBSD 14.0 with vcpu=2. Adding back the retry logic for EAGAIN and ECONNREFUSED made this connect() failure difficult to reproduce even with "--listen-backlog=1". Tested: - AlmaLinux 9.3, 8.9 - Arch Linux - CentOS Linux Stream 9, Stream 8, 7.9.2009, 6.10 - Debian sid, 12.5, 11.9, 10.13, 9.13, 8.11, 7.11, 6.0.10, 5.0.10, 4.0 - Fedora 39, 38, 37 - FreeBSD 14.0, 13.3, 13.2 - NetBSD 9.3 - OpenBSD 7.4, 7.3 - openSUSE 15.5, 15.4 - Ubuntu 23.10, 22.04.4, 20.04.6, 18.04.6, 16.04.7, 14.04.6

dun · 2024-03-11T18:57:50Z

@meow-watermelon Thanks again for your PR!

meow-watermelon · 2024-03-11T19:41:10Z

Thanks @dun for the improvements and thorough testings on this change!

dun reviewed Feb 27, 2024

View reviewed changes

src/munged/conf.c Outdated Show resolved Hide resolved

src/munged/conf.c Outdated Show resolved Hide resolved

src/munged/munged.c Outdated Show resolved Hide resolved

dun reviewed Feb 28, 2024

View reviewed changes

meow-watermelon and others added 2 commits March 8, 2024 00:06

dun added the feature label Mar 11, 2024

dun added this to the 0.5.16 milestone Mar 11, 2024

dun force-pushed the add_listen_backlog_opt branch from 3b0e5c3 to 0689a1f Compare March 11, 2024 17:44

dun merged commit 36e17dc into dun:master Mar 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add --listen-backlog option in munged #139

add --listen-backlog option in munged #139

meow-watermelon commented Feb 25, 2024

dun left a comment

dun left a comment

dun commented Feb 28, 2024

meow-watermelon commented Feb 28, 2024

dun commented Mar 1, 2024

meow-watermelon commented Mar 2, 2024

dun commented Mar 11, 2024

meow-watermelon commented Mar 11, 2024

add --listen-backlog option in munged #139

add --listen-backlog option in munged #139

Conversation

meow-watermelon commented Feb 25, 2024

dun left a comment

Choose a reason for hiding this comment

dun left a comment

Choose a reason for hiding this comment

dun commented Feb 28, 2024

meow-watermelon commented Feb 28, 2024

dun commented Mar 1, 2024

meow-watermelon commented Mar 2, 2024

dun commented Mar 11, 2024

meow-watermelon commented Mar 11, 2024