You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dualtor nightly suffers from http request timeout:
2025 Jan 13 11:40:09.678044 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: y_cable_port 5: attempt=3, GET http://10.64.246.133:8080/mux/vms21-3/8 for physical_port 5 failed with TimeoutError('timed out')
2025 Jan 13 11:40:09.678044 str2-7050cx3-acs-10 NOTICE pmon#CCmisApi: y_cable_port 5: Sleep 1 seconds to retry GET http://10.64.246.133:8080/mux/vms21-3/8 for physical port 5
2025 Jan 13 11:40:13.073700 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: y_cable_port 1: attempt=6, GET http://10.64.246.133:8080/mux/vms21-3/0 for physical_port 1 failed with TimeoutError('timed out')
2025 Jan 13 11:40:13.073700 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: y_cable_port 1: Retry GET http://10.64.246.133:8080/mux/vms21-3/0 for physical port 1 timeout after 30 seconds, attempted=6
2025 Jan 13 11:40:13.073700 str2-7050cx3-acs-10 WARNING pmon#CCmisApi: Error: Could not get active side for cli command show mux hwmode muxdirection logical port Ethernet0 and physical port 1
Mux simulator cannot handle new requests to to the listen backlog is full; it appeared that there are too many connections in CLOSE_WAIT; so mux simulator cannot accept new connections due to this socket fd leak.
# ss -lnt | grep 8080
LISTEN 1025 1024 0.0.0.0:8080 0.0.0.0:*
# netstat -s | grep -i listen
472273 times the listen queue of a socket overflowed
472277 SYNs to LISTEN sockets dropped
# netstat -ant | grep 8080 | grep CLOSE_WAIT | wc -l
1313
how to reproduce?
The CLOSE_WAIT sockets are created by the HTTP requests from the following code:
logger.debug('Received response {}/{} with content {}'.format(resp.status_code, resp.reason, resp.text))
returnresp.status_code==200
exceptExceptionase:
logger.warn("POST {} with data {} failed, err: {}".format(server_url, data, repr(e)))
returnFalse
It appeared that, the toggle-all command that calls this function can take more than 10 seconds to finish, which causes the post request timeout and close the tcp connection on the client side. The TCP socket on the server side (mux simulator) transits into CLOSE_WAIT and mux simulator doesn't close them, which results in a socket fd leak.
Results you see
As the description.
Results you expected to see
As the description.
Is it platform specific
generic
Relevant log output
As the description.
Output of show version
As the description.
Attach files (if any)
No response
The text was updated successfully, but these errors were encountered:
Issue Description
dualtor nightly suffers from http request timeout:
Mux simulator cannot handle new requests to to the listen backlog is full; it appeared that there are too many connections in
CLOSE_WAIT
; so mux simulator cannot accept new connections due to this socket fd leak.how to reproduce?
The
CLOSE_WAIT
sockets are created by the HTTP requests from the following code:sonic-mgmt/tests/common/dualtor/mux_simulator_control.py
Lines 178 to 209 in 62a15b7
It appeared that, the toggle-all command that calls this function can take more than 10 seconds to finish, which causes the post request timeout and close the tcp connection on the client side. The TCP socket on the server side (mux simulator) transits into
CLOSE_WAIT
and mux simulator doesn't close them, which results in a socket fd leak.Results you see
As the description.
Results you expected to see
As the description.
Is it platform specific
generic
Relevant log output
Output of
show version
Attach files (if any)
No response
The text was updated successfully, but these errors were encountered: