Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(fix): Hitting unhealthy node 10 times #2819

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ivaylogarnev-limechain
Copy link
Contributor

@ivaylogarnev-limechain ivaylogarnev-limechain commented Jan 23, 2025

Description:
This PR refactors and fixes the unhealthy node error handling logic in Executable.js. Additionally, it introduces a few unit tests to confirm this.

Related issue(s):
#2804

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

…roduces excludeCurrent method in the List class

Signed-off-by: ivaylogarnev-limechain <[email protected]>
@ivaylogarnev-limechain ivaylogarnev-limechain changed the title fix: Refactor the unhealthy logic inside the Executable class and int… chore(fix): Hitting unhealthy node 10 times Jan 23, 2025
@ivaylogarnev-limechain ivaylogarnev-limechain marked this pull request as ready for review January 24, 2025 09:19
@ivaylogarnev-limechain ivaylogarnev-limechain requested a review from a team as a code owner January 24, 2025 09:19
continue;
}

this._nodeAccountIds.advance();
Copy link
Contributor

@0xivanov 0xivanov Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some notes:

  1. Before always advance'd (before the if (!node.isHealthy())), now we have 2 advance calls, is this necessary?
  2. Around line 740 in executable.js, we have client._network.increaseBackoff(node); which removes the node from the healthy nodes list and sets new value for _readmitTime. This logic works as expected, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Both advances handle different scenarios in the retry/rotation logic.

    1.1 The first advance() inside the health check is specifically for when a node is unhealthy.

    1.2 The second advance() is part of the normal node rotation logic when trying different nodes for retries.
    (tested in AccountInfoMocking.js - should retry on INTERNAL and retry multiple nodes)

  2. Yes, it still reaches that point because this code accounts for the scenario where the node is initially healthy but throws an error after making a request. If the error is a GrpcService or HttpError, the increaseBackOff() method is triggered, as expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the 1st one - this means we advance in both cases - can we have only 1 advance above if (!node.isHealthy()) on line 644?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that with a single advance before the health check, we're effectively skipping the health check of the first node and this would break the node health checking functionality as demonstrated by the failing test "should skip unhealthy node and execute with healthy node".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Comment on lines +597 to +600
const responses1 = [
{ response: ACCOUNT_INFO_QUERY_COST_RESPONSE },
{ response: ACCOUNT_INFO_QUERY_RESPONSE },
];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have it("should retry on UNAVAILABLE", async function () { , what is the behaviour there with more than 1 node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added that covers this case.

Copy link
Contributor

@ivaylonikolov7 ivaylonikolov7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you manage to test the new functionality on testnet where we actually have unhealthy nodes? I know it's hard to make integration tests for this functionality because you can't make localnode unhealthy easily and then you are going to need a second account that will execute the transaction on?

test/unit/AccountInfoMocking.js Outdated Show resolved Hide resolved
@ivaylogarnev-limechain
Copy link
Contributor Author

Did you manage to test the new functionality on testnet where we actually have unhealthy nodes? I know it's hard to make integration tests for this functionality because you can't make localnode unhealthy easily and then you are going to need a second account that will execute the transaction on?

This functionality is being tested in AccountBalanceIntegrationTest.js under the test case "can connect to testnet with TLS."

Currently, node 5 on the testnet is down, yet the tests still pass. If you explicitly hardcode the nodeAccountId to only use node 5 with:

.setNodeAccountIds([new AccountId(5)])

it will throw the error:
"Network connectivity issue: All nodes are unhealthy. Original node list: 0.0.5

ivaylonikolov7
ivaylonikolov7 previously approved these changes Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants