Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add robustness failpoint for IO stall in raft loop #16859

Merged
merged 2 commits into from
Nov 3, 2023

Conversation

ZhouJianMS
Copy link
Contributor

Introduces a new failpoint to simulate an IO stall in the Raft loop during robustness tests. Extensible to check whether the stalled raft loop handled properly once #15247 (comment) or similar fix merged.

@ZhouJianMS ZhouJianMS force-pushed the zhoujian/raft-io-stall branch from f31c859 to 827dc18 Compare November 3, 2023 08:43
@serathius
Copy link
Member

@ZhouJianMS Thanks for contribution!

I have been looking into adding sleep during robustness for some time. #16776

Main blocker was etcd-io/gofail#47. Addition of API that would allow us to confirm that sleep was really executed.

I really like the idea of deactivating the failpoint, it should allow us to longer sleep times without breaking the minimal qps requirement.

serathius
serathius previously approved these changes Nov 3, 2023
Copy link
Member

@serathius serathius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sleep failpoints were blocked for long time. Having them strict is important, however I don't think we should block on this any more.

This PR already brings us some value.

@serathius serathius dismissed their stale review November 3, 2023 09:52

Ups, missed couple of error handling issues.

@ZhouJianMS
Copy link
Contributor Author

@serathius Error handling added, please review.

@serathius
Copy link
Member

Looks like the failpoint worked image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants