-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Await all pending activity in testConnectAndDisconnect #40037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Await all pending activity in testConnectAndDisconnect #40037
Conversation
It looks like this test can finish while there are still asynchronous connection attempts in flight, causing test failures as reported in elastic#40030. This change ensures that the test waits for everything to settle down before the end of the test. NB I was unable to reproduce this failure, although it's affected quite a number of CI runs. This is my best guess. This also unmutes `testOnlyBlocksOnConnectionsToNewNodes`: the only failures of _this_ test that I could see were related to earlier failures. Closes elastic#40030.
Pinging @elastic/es-distributed |
We call `ensureConnections()` to undo the effects of a disruption. However, it is possible that one or more targets are currently CONNECTING and have been since the disruption was active, and that the connection attempt was thwarted by a concurrent disruption to the connection. If so, we cannot simply add our listener to the queue because it will be notified when this CONNECTING activity completes even though it was disrupted. We must therefore wait for all the current activity to finish and then go through and reconnect to any missing nodes. Closes elastic#40030.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I added one small comment.
} finally { | ||
assertTrue(stopReconnecting.compareAndSet(false, true)); | ||
reconnectionThread.join(); | ||
ensureConnections(service); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to put ensureConnections outside the finally block to ensure that if the test fails, we see the original error rather than potentially an error from ensureConnections.
We call `ensureConnections()` to undo the effects of a disruption. However, it is possible that one or more targets are currently CONNECTING and have been since the disruption was active, and that the connection attempt was thwarted by a concurrent disruption to the connection. If so, we cannot simply add our listener to the queue because it will be notified when this CONNECTING activity completes even though it was disrupted. We must therefore wait for all the current activity to finish and then go through and reconnect to any missing nodes. Closes #40030.
Relates #39629 |
It looks like this test can finish while there are still asynchronous
connection attempts in flight, causing test failures as reported in #40030.
This change ensures that the test waits for everything to settle down before
the end of the test.
NB I was unable to reproduce this failure, although it's affected quite a
number of CI runs. This is my best guess.
This also unmutes
testOnlyBlocksOnConnectionsToNewNodes
: the only failures ofthis test that I could see were related to earlier failures.
Closes #40030