-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cluster Membership] Improve disaster recovery #9296
Conversation
* Construct expander graph for monitoring to improve coverage on larger clusters * More aggressive default IAmAlive updates 5 min period * 2 missed updates (10 min) -> 30s * 3 (90 sec) * MembershipTableManager: allow membership updates with equal version and newer IAmAlive values * Increase NumProbedSilos from 3 to 10 * Silos proactively monitor all stale silos * Account for some local clock misconfigurations in ValidateInitialConnectivity by snapping to latest IAmAliveTime * Include intermediary silo vote in table after failed indirect probe
// Include the indirect probe silo's vote as well, if it exists. | ||
if (indirectProbingSilo is not null) | ||
{ | ||
entry.AddOrUpdateSuspector(indirectProbingSilo, now, clusterMembershipOptions.NumVotesForDeathDeclaration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the intermediary silo's address in the suspectors list, counting it as a unique vote.
@@ -11,6 +11,7 @@ | |||
using System.Text; | |||
using System.Text.Json; | |||
using System.Text.Json.Serialization; | |||
using Orleans.Serialization.Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is likely superfluous
I want to split this into multiple PRs, one per logical change... I'll close this once I have. |
Microsoft Reviewers: Open in CodeFlow