Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P: Block propagation optimization #1099

Merged
merged 33 commits into from
Jan 27, 2025
Merged

P2P: Block propagation optimization #1099

merged 33 commits into from
Jan 27, 2025

Conversation

heifner
Copy link
Member

@heifner heifner commented Jan 14, 2025

Block Nack Feature

Currently when a node has many connections, it can overwhelm its bandwidth when large blocks are created. A node will send every block it receives to all of its peers that it has not already received the block from.

This PR adds a new P2P protocol version, which when both sides of a connection have the feature, it is enabled. For example, if a peer is running nodeos version 1.0.x then it will continue to always receive and send all blocks.

  • Adds block_nack_message and block_notice_message to the P2P message protocol.
  • Adds new option p2p-disable-block-nack which disables this functionality. It may be useful to disable on canary/relay nodes of producers so blocks are never missed requiring a block request. Or when absolute block latency desired. Note that latency can be hurt if many peers are configured. For large number of peers it is recommended to not disable the block nack.

Algorithm:

  • block_nack_message sends block_id to peer when it receives a block it has already received from another peer.
  • block_notice_messsage sent instead of a block when a threshold of block_nack_message are received.
    • A peer will respond to a block_notice_message with the existing request_message when two consecutive are received for blocks not received from other peers.

Resolves #1091

heifner added 21 commits January 6, 2025 12:39
…de is shutdown right before. This allows stabilization if the bios node is the node that most nodes are getting blocks from.
…ady has. It may have already echoed back the block, but don't want that to prevent the nack from being sent.
…f via last handshake. The last handshake could be very old causing way too many blocks to be sent.
@heifner heifner added the OCI Work exclusive to OCI team label Jan 14, 2025
@@ -193,7 +193,7 @@ def __init__(self, testHelperConfig: TestHelperConfig=TestHelperConfig(), cluste

Utils.Debug = self.testHelperConfig.verbose
self.errorExit = Utils.errorExit
self.emptyBlockGoal = 1
self.emptyBlockGoal = 7
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This prevents a disconnect from happening right before the performance run. Nicer to have it stable before the performance run.

@@ -623,6 +628,7 @@ namespace eosio {
void reset() {
fc::lock_guard g( _mtx );
_write_queue.clear();
_sync_write_queue.clear();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added back in the _sync_write_queue removed in #1090. Had a test failure due to block ordering here causing unlinkable blocks. This is really separate from block nack. If desired I could pull it out into a different PR.

@@ -912,6 +921,13 @@ namespace eosio {

std::chrono::nanoseconds connection_start_time{0};

// block nack support
static constexpr uint16_t consecutive_block_nacks_threshold{2}; // stop sending blocks when reached
Copy link
Member Author

@heifner heifner Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be as low as 0. 0 worked fine in all my tests and I don't think a large value is useful here.

Also this PR, only doesn't block nack one connection. If we wanted to keep 2 or more peers sending us blocks the PR could be updated with that functionality. I don't personally think it is needed. And I think best to keep this as simple as possible. If we find this feature doesn't work it can be disabled via p2p-disable-block-nack and we can address any issues we find.

@heifner heifner marked this pull request as ready for review January 14, 2025 19:28
@heifner heifner requested review from greg7mdp and linh2931 January 14, 2025 19:28
@ericpassmore
Copy link
Contributor

ericpassmore commented Jan 16, 2025

Note:start
category: Other
component: P2P
summary: Block propagation optimization.
Note:end

plugins/net_plugin/include/eosio/net_plugin/protocol.hpp Outdated Show resolved Hide resolved
plugins/net_plugin/include/eosio/net_plugin/protocol.hpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
plugins/net_plugin/net_plugin.cpp Outdated Show resolved Hide resolved
@bhazzard bhazzard requested review from spoonincode and removed request for linh2931 January 21, 2025 18:39
…te determination if a block request is needed. Also include in the block request_message the peer head. The peer head allows the node to determine if on a fork and send from LIB instead. Also create block_on_fork function to remove duplicate code.
@heifner heifner merged commit f15440e into main Jan 27, 2025
36 checks passed
@heifner heifner deleted the GH-1091-block-nack branch January 27, 2025 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OCI Work exclusive to OCI team performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

P2P: Experimental implementation of block propagation enhancement proposal
4 participants