Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: mux simulator cannot handle new requests due to the listen backlog full #16495

Open
lolyu opened this issue Jan 13, 2025 · 0 comments
Open

Comments

@lolyu
Copy link
Contributor

lolyu commented Jan 13, 2025

Issue Description

dualtor nightly suffers from http request timeout:

2025 Jan 13 11:40:09.678044 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: y_cable_port 5: attempt=3, GET http://10.64.246.133:8080/mux/vms21-3/8 for physical_port 5 failed with TimeoutError('timed out')
2025 Jan 13 11:40:09.678044 str2-7050cx3-acs-10 NOTICE pmon#CCmisApi: y_cable_port 5: Sleep 1 seconds to retry GET http://10.64.246.133:8080/mux/vms21-3/8 for physical port 5
2025 Jan 13 11:40:13.073700 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: y_cable_port 1: attempt=6, GET http://10.64.246.133:8080/mux/vms21-3/0 for physical_port 1 failed with TimeoutError('timed out')
2025 Jan 13 11:40:13.073700 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: y_cable_port 1: Retry GET http://10.64.246.133:8080/mux/vms21-3/0 for physical port 1 timeout after 30 seconds, attempted=6
2025 Jan 13 11:40:13.073700 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: Error: Could not get active side for cli command show mux hwmode muxdirection logical port Ethernet0 and physical port 1

Mux simulator cannot handle new requests to to the listen backlog is full; it appeared that there are too many connections in CLOSE_WAIT; so mux simulator cannot accept new connections due to this socket fd leak.

# ss -lnt | grep 8080
LISTEN   1025         1024                0.0.0.0:8080             0.0.0.0:*
# netstat -s | grep -i listen
    472273 times the listen queue of a socket overflowed
    472277 SYNs to LISTEN sockets dropped
# netstat -ant | grep 8080 | grep CLOSE_WAIT | wc -l
1313

how to reproduce?

The CLOSE_WAIT sockets are created by the HTTP requests from the following code:

def _post(server_url, data):
"""
Helper function for posting data to y_cable server.
Args:
server_url: a str, the full address of mux server, like http://10.0.0.64:8080/mux/vms17-8[/1/drop|output]
data: data to post {"out_sides": ["nic", "upper_tor", "lower_tor"]}
Returns:
True if succeed. False otherwise
"""
try:
session = Session()
if "allowed_methods" in inspect.signature(Retry).parameters:
retry = Retry(total=3, connect=3, backoff_factor=1,
allowed_methods=frozenset(['GET', 'POST']),
status_forcelist=[x for x in requests.status_codes._codes if x != 200])
else:
retry = Retry(total=3, connect=3, backoff_factor=1,
method_whitelist=frozenset(['GET', 'POST']),
status_forcelist=[x for x in requests.status_codes._codes if x != 200])
session.mount('http://', HTTPAdapter(max_retries=retry))
server_url = '{}?reqId={}'.format(server_url, uuid.uuid4()) # Add query string param reqId for debugging
logger.debug('POST {} with {}'.format(server_url, data))
headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}
resp = session.post(server_url, json=data, headers=headers, timeout=10)
logger.debug('Received response {}/{} with content {}'.format(resp.status_code, resp.reason, resp.text))
return resp.status_code == 200
except Exception as e:
logger.warn("POST {} with data {} failed, err: {}".format(server_url, data, repr(e)))
return False

It appeared that, the toggle-all command that calls this function can take more than 10 seconds to finish, which causes the post request timeout and close the tcp connection on the client side. The TCP socket on the server side (mux simulator) transits into CLOSE_WAIT and mux simulator doesn't close them, which results in a socket fd leak.

Results you see

As the description.

Results you expected to see

As the description.

Is it platform specific

generic

Relevant log output

As the description.

Output of show version

As the description.

Attach files (if any)

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant