Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bittensor does not gracefully terminate TCP connections to subtensor nodes #2585

Open
zhedgehog opened this issue Jan 17, 2025 · 2 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@zhedgehog
Copy link

zhedgehog commented Jan 17, 2025

Describe the bug

Library: bittensor==8.5.1
Issue: When connecting to a Subtensor node, the Bittensor library does not gracefully terminate connections, leaving them in an ESTABLISHED state. This behavior occurs during mining operations where the miner frequently re-syncs the metagraph state.

Impact:

Over time, as the client (miner or validator) continues running, it accumulates an increasing number of active connections.
This can lead to connection exhaustion at the operating system level, potentially causing service interruptions or degraded performance.

To Reproduce

Install bittensor==8.5.1 and set up a mining client.
Start mining and monitor the connections using a system tool (e.g., netstat or ss) to observe the number of ESTABLISHED connections.
Allow the miner to run for an extended period.
Note the number of accumulated connections over time.

Expected behavior

Connections should be properly closed when no longer in use, preventing the accumulation of stale ESTABLISHED connections.

Actual behavior:

Connections remain in the ESTABLISHED state, accumulating over time without being terminated.

Screenshots

python ~/scripts/vast-hosts-cmd.py "netstat -ano|grep 9944|wc -l"

--
[ssh8.runpod.x:27114]
151

[ssh5.runpod.x:26300]
151

[ssh5.runpod.x:20816]
149

[ssh8.runpod.x:27070]
151

[ssh4.runpod.x:34704]
151

[ssh8.runpod.x:25682]
151

[ssh8.runpod.x:34666]
151

[ssh4.runpod.x:34702]
100

[ssh4.runpod.x:26328]
151

[ssh4.runpod.x:35898]
151

[ssh8.runpod.x:38718]
151

[ssh8.runpod.x:15040]
149

[ssh5.runpod.x:20814]
151

[ssh8.runpod.x:34682]
151

[ssh4.runpod.x:34646]
151

[ssh4.runpod.x:34630]
23

[ssh5.runpod.x:16432]
23

[ssh5.runpod.x:28762]
149

--

Image

Environment

Ubuntu 22.04, Bittensor 8.5.1

Additional context

No response

@zhedgehog zhedgehog added the bug Something isn't working label Jan 17, 2025
@thewhaleking
Copy link
Contributor

This should be resolved as part of the new Substrate push, but I do want to investigate this further. I suspect it has to do with how py-substrate-interface handles websocket connections.

@thewhaleking thewhaleking self-assigned this Jan 18, 2025
@zhedgehog
Copy link
Author

zhedgehog commented Jan 18, 2025

good stuff, much appreciated. I can help with setting up environment to help reproduce and test it, if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants