-
Notifications
You must be signed in to change notification settings - Fork 542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flaky TestAlertmanager_ReplicasPosition data race #197
Comments
I'm new, but let me take a look. Where can I see test results or did you see on your machine? How frequently? Can I run "make test" multiple times in parallel to fasten repro? @bboreham |
Run the specific go test with |
Investigation so far into the race condition:
I don't know how serious this issue is. To be triggered, the memberlist has to be configured with DeadNodeReclaimTime > 0 and a node with different address needs to leave/be marked dead and rejoin. Strictly speaking the Members function is theoretically broken and it should return a copy of the data , not pointers to data under a mutex. Proof:
|
As far as I can tell, the worst that can happen is that the information that prometheus cluster.go is using will be newer. Even if the data is garbled, it's not used directly. |
I did reproduce the race, but with a slightly more direct method:
Test code:
|
The above race-condition can be circumvented in our case with something like:
|
Opened hashicorp/memberlist#250 to get some feedback on this issue. |
Test reports a data race; also it runs for four minutes and consumes > 500MB of RAM.
The text was updated successfully, but these errors were encountered: