Skip to content

Await all pending activity in testConnectAndDisconnect #40037

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

DaveCTurner
Copy link
Contributor

It looks like this test can finish while there are still asynchronous
connection attempts in flight, causing test failures as reported in #40030.
This change ensures that the test waits for everything to settle down before
the end of the test.

NB I was unable to reproduce this failure, although it's affected quite a
number of CI runs. This is my best guess.

This also unmutes testOnlyBlocksOnConnectionsToNewNodes: the only failures of
this test that I could see were related to earlier failures.

Closes #40030

It looks like this test can finish while there are still asynchronous
connection attempts in flight, causing test failures as reported in elastic#40030.
This change ensures that the test waits for everything to settle down before
the end of the test.

NB I was unable to reproduce this failure, although it's affected quite a
number of CI runs. This is my best guess.

This also unmutes `testOnlyBlocksOnConnectionsToNewNodes`: the only failures of
_this_ test that I could see were related to earlier failures.

Closes elastic#40030.
@DaveCTurner DaveCTurner added >test Issues or PRs that are addressing/adding tests :Distributed Coordination/Network Http and internode communication implementations v8.0.0 v7.2.0 labels Mar 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

We call `ensureConnections()` to undo the effects of a disruption. However, it
is possible that one or more targets are currently CONNECTING and have been
since the disruption was active, and that the connection attempt was thwarted
by a concurrent disruption to the connection.  If so, we cannot simply add our
listener to the queue because it will be notified when this CONNECTING activity
completes even though it was disrupted. We must therefore wait for all the
current activity to finish and then go through and reconnect to any missing
nodes.

Closes elastic#40030.
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I added one small comment.

} finally {
assertTrue(stopReconnecting.compareAndSet(false, true));
reconnectionThread.join();
ensureConnections(service);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to put ensureConnections outside the finally block to ensure that if the test fails, we see the original error rather than potentially an error from ensureConnections.

@DaveCTurner DaveCTurner merged commit cac5b4b into elastic:master Mar 15, 2019
@DaveCTurner DaveCTurner deleted the 2019-03-14-fix-testConnectAndDisconnect branch March 15, 2019 08:04
DaveCTurner added a commit that referenced this pull request Mar 15, 2019
We call `ensureConnections()` to undo the effects of a disruption. However, it
is possible that one or more targets are currently CONNECTING and have been
since the disruption was active, and that the connection attempt was thwarted
by a concurrent disruption to the connection.  If so, we cannot simply add our
listener to the queue because it will be notified when this CONNECTING activity
completes even though it was disrupted. We must therefore wait for all the
current activity to finish and then go through and reconnect to any missing
nodes.

Closes #40030.
@DaveCTurner
Copy link
Contributor Author

Relates #39629

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Network Http and internode communication implementations >test Issues or PRs that are addressing/adding tests v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CI] NodeConnectionsServiceTests. testConnectAndDisconnect & testOnlyBlocksOnConnectionsToNewNodes failed
4 participants