Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DefaultConfig has dangerous defaults causing hung Reads #142

Open
lattwood opened this issue Jan 4, 2025 · 3 comments · May be fixed by #143
Open

DefaultConfig has dangerous defaults causing hung Reads #142

lattwood opened this issue Jan 4, 2025 · 3 comments · May be fixed by #143
Assignees

Comments

@lattwood
Copy link
Contributor

lattwood commented Jan 4, 2025

The default values for Session.config.KeepAliveInterval and Session.config.ConnectionWriteTimeout of 30s and 10s create the possibility for timed out writes that most (Hashicorp Nomad included) aren't handling in their readers.

Calls to Stream.Read on one side of a connection will hang until the underlying Session is closed if the corresponding Stream.Write call on the other side it's waiting for returns with ErrConnectionWriteTimeout. This happens in the case of network congestion between the two sides.

If you keep Session.sendCh full (fixed capacity of 64) for ConnectionWriteTimeout, but for less than the KeepAliveInterval + ConnectionWriteTimeout (which would kill the Session), Stream.Write will return ErrConnectionWriteTimeout. The state of the underlying Session or Stream is not modified. When this happens (or doesn't, heh), the other side's Stream.Read call that's waiting for that write will never return because there's no timeout for this edge-case.

Since no keep alive timed out, you can continue to use the Session once the network congestion is resolved, but that Stream.Read call will only return when the Session closes or the response shows up. Since the write call on the other side timed out... well, that's a problem.

I can see three possible fixes, one heavier handed than the other-

  1. If Stream.Write times out, it should implicitly call Stream.Close while returning, to notify any blocked calls to Stream.Read on the other side return.
    • If the call to Stream.Close fails then Session.Close should get called, which it doesn't look like that happens now either.
  2. A call to Stream.Write that times out should just close the entire Session.
  3. (The hacky fix) Make the default keep alive interval less than the write timeout
@lattwood lattwood changed the title DefaultConfig has dangerous defaults DefaultConfig has dangerous defaults causing hung Reads Jan 4, 2025
@lattwood
Copy link
Contributor Author

lattwood commented Jan 6, 2025

The more I'm thinking about this, the more it seems like the sanest option is to consider the entire Session invalidated when this happens.

@lattwood
Copy link
Contributor Author

Hey @schmichael, anything to share yet?

I was contemplating writing a fix for this myself.

@lattwood
Copy link
Contributor Author

Opened a PR with a fix :D #143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants