Skip to content
This repository has been archived by the owner on Sep 25, 2020. It is now read-only.

Fix flappy integration test #88

Merged
merged 9 commits into from
Sep 15, 2016
Merged

Fix flappy integration test #88

merged 9 commits into from
Sep 15, 2016

Conversation

mennopruijssers
Copy link
Contributor

@mennopruijssers mennopruijssers commented Sep 13, 2016

This PR fixes three issues:

  • A race condition where a test completes and times out at the same time: this is fixed by breaking out the event-loop early after a time out.
  • Start running the test before the SUT is ready: fixed by waiting for an initial ping from the SUT.
  • The implementation of lookup in the integration tests is incorrect. Sometimes it returns an incorrect result

@mennopruijssers
Copy link
Contributor Author

mennopruijssers commented Sep 13, 2016

Test's pass locally on both go and node:

ringpop-go:

make test-integration
test/run-integration-tests
[fetch ringpop-common] + cd test/ringpop-common
[fetch ringpop-common] + git fetch origin fix-flappy
[build testpop] + make testpop
[build testpop] rm -f testpop
[build testpop] go build ./scripts/testpop/
[fetch ringpop-common] From ssh://github.com/uber/ringpop-common
[fetch ringpop-common]  * branch            fix-flappy -> FETCH_HEAD
[fetch ringpop-common] + git checkout FETCH_HEAD
[fetch ringpop-common] Previous HEAD position was 4e35f91... wait for first ping in prepare cluster
[fetch ringpop-common] HEAD is now at 0209af7... Don’t conumse initial ping
[fetch ringpop-common] + cd -
[fetch ringpop-common] + cd test/ringpop-common/test
[fetch ringpop-common] + cd -
[test-runner] Running test for cluster size 1...
[test-runner] Running test for cluster size 2...
[test-runner] Running test for cluster size 3...
[test-runner] Running test for cluster size 4...
[test-runner] Running test for cluster size 5...
[test-runner] Running test for cluster size 10...
Tests passed

ringpop-node:

npm run test-shared-integration-tests

> [email protected] test-shared-integration-tests /Users/menno/Sources/Uber/nodejs/ringpop-node
> test/run-shared-integration-tests

[fetch ringpop-common] + cd test/ringpop-common
[fetch ringpop-common] + git fetch origin fix-flappy
[fetch ringpop-common] From ssh://github.com/uber/ringpop-common
[fetch ringpop-common]  * branch            fix-flappy -> FETCH_HEAD
[fetch ringpop-common] + git checkout FETCH_HEAD
[fetch ringpop-common] Previous HEAD position was 4e35f91... wait for first ping in prepare cluster
[fetch ringpop-common] HEAD is now at 0209af7... Don’t conumse initial ping
[fetch ringpop-common] + cd -
[fetch ringpop-common] + cd test/ringpop-common/test
[fetch ringpop-common] + npm install
[fetch ringpop-common] npm WARN package.json [email protected] No license field.
[fetch ringpop-common] + cd -
[test-runner] Running test for cluster size 1...
[test-runner] Running test for cluster size 2...
[test-runner] Running test for cluster size 3...
[test-runner] Running test for cluster size 4...
[test-runner] Running test for cluster size 5...
[test-runner] Running test for cluster size 10...
Tests passed

@@ -121,8 +121,9 @@ function testStateTransitions(ns, initial, newState, finalState, incNoDelta, sta
function prepareCluster(insert_fns) {
return function(t, tc, n) {
return [
dsl.assertDetectChecksumMethod(t, tc),
dsl.waitForPing(t, tc, false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should describe why we are waiting for a ping, and why we don't consume it. Both are very subtile but important facts.

@thanodnl
Copy link
Contributor

LGTM

@thanodnl
Copy link
Contributor

Nice find on the bug in the lookup function of the test coordinator, fix is legit.

// By waiting for the first ping we make sure the SUT is ready to go.
// We don't consume it so other tests (especially the ping-tests)
// can still assert implementation details related to ping distribution.
dsl.waitForPing(t, tc, false),
Copy link
Contributor

@CorgiMan CorgiMan Sep 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. Shouldn't the joins happen before the ping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea behind it is to wait until the node starts pinging. At that point we can consider it to be fully ready and start running the other validations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But wouldn't it make sense to first wait for the joins, and then for the pinging. I don't think the node will ping before it has joined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll reorder them

Copy link
Contributor Author

@mennopruijssers mennopruijssers Sep 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a discussion with Nils we changed it back (and added a comment). The reason why we're waiting for the initial ping first is so we wait until the node is fully ready before checking the number of joins. This way we're able to detect if there are too many joins during standup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks!

@CorgiMan
Copy link
Contributor

Looks good!

@mennopruijssers mennopruijssers merged commit a21877d into master Sep 15, 2016
@mennopruijssers mennopruijssers deleted the fix-flappy branch September 15, 2016 09:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants