-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[5.0.3] P2P: Resolve on reconnect #2408
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can a test be added?
fc_wlog( logger, "Unable to resolve ${host}:${port} ${error}", | ||
("host", host)("port", port)( "error", err.message() ) ); | ||
c->set_state(connection::connection_state::closed); | ||
++(c->consecutive_immediate_connection_close); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does increasing this do? I thought it was going to cause some sort of cool down when reaching def_max_consecutive_immediate_connection_close
but doesn't seem to matter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to look into it, but I think the change that broke this re-connect also broke this back off of consecutive_immediate_connection_close
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, actually this a reason to reuse the connection object. I guess I should maintain the reuse of the connection object for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like now (on 51fd8fa) after failing to resolve a host def_max_consecutive_immediate_connection_close
times it just never tries it again. But if the resolve completes and simply unable to connect to the remote host, it will try retry indefinitely. I can't tell if this discrepancy is intentional or not. It makes me wonder if we should not be increasing this counter here for consistency between the two, but realistically if someone has a completely bad hostname it's probably not fixing itself.
fc_wlog( logger, "Unable to resolve ${host}:${port} ${error}", | ||
("host", host)("port", port)( "error", err.message() ) ); | ||
c->set_state(connection::connection_state::closed); | ||
++(c->consecutive_immediate_connection_close); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like now (on 51fd8fa) after failing to resolve a host def_max_consecutive_immediate_connection_close
times it just never tries it again. But if the resolve completes and simply unable to connect to the remote host, it will try retry indefinitely. I can't tell if this discrepancy is intentional or not. It makes me wonder if we should not be increasing this counter here for consistency between the two, but realistically if someone has a completely bad hostname it's probably not fixing itself.
if( !err ) { | ||
c->connect( results ); | ||
} else { | ||
fc_wlog( logger, "Unable to resolve ${host}:${port} ${error}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw this is a warning log, where failing to connect is an info log. Not sure if you want to make consistent or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the async_connect
fails it calls c->close(false)
which calls _close()
which increments consecutive_immediate_connection_close
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a warn
is appropriate if unable to resolve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the
async_connect
fails it callsc->close(false)
which calls_close()
which incrementsconsecutive_immediate_connection_close
.
There is something different between the two though. For example when I run nodeos with --p2p-peer-address fdsfdsfds.invalid:4444 --p2p-peer-address fddsfdsfdsfds.invalid:4444 --p2p-peer-address 127.0.0.1:2121 --p2p-peer-address www.google.com:4444
you'll see how the latter 2 will try to reconnect forever where the first 2 won't.
There is also some log oddness at shutdown,
info 2024-09-26T02:45:43.993 nodeos net_plugin.cpp:4578 close_all ] close all 4 connections
info 2024-09-26T02:45:43.993 net-1 net_plugin.cpp:1460 _close ] ["127.0.0.1:2121" - 1 <unknown>:<unknown>] closing
info 2024-09-26T02:45:43.993 net-2 net_plugin.cpp:1460 _close ] ["fddsfdsfdsfds.invalid:4444" - 2 <unknown>:<unknown>] closing
info 2024-09-26T02:45:43.993 nodeos net_plugin.cpp:4385 plugin_shutdown ] exit shutdown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This different in backoff behavior seems to match Leap 4.0 behavior.
Resolve address on reconnect.
See AntelopeIO/spring#525
Instead of:
Now you get: