-
Notifications
You must be signed in to change notification settings - Fork 25.4k
NETWORKING: MockTransportService Wait for Close #35038
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Make `MockTransportService` wait `30s` for close listeners to run before failing the assertion * Closes elastic#34990
Pinging @elastic/es-distributed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just do:
try {
ESTestCase.assertBusy(() -> {
assert openConnections.size() == 0 : "still open connections: " + openConnections;
});
} catch (AssertionFailure e) {
throw e;
} catch (Exception e) {
throw new IOException(e);
}
The wait / notify
semantics are kind of tricky to handle (for example I think you should have the notify call be in a synchronized block).
You could also add an assertBusyUnchecked
the takes a Runnable
opposed to a CheckedRunnable
if you want to avoid the weird exception handling
@tbrooks8 the We would also need to do: ESTestCase.assertBusy(() -> {
synchronized(openConnections) {
assert openConnections.size() == 0 : "still open connections: " + openConnections;
}
}); here to release to lock on the |
I am not 100% convinced that this is the right fix. #34990 seems to affect freshly-opened connections, which to me makes it feel like a race condition between opening connections and closing the transport down, and waiting for some arbitrary timeout makes me a little uneasy. The synchronisation in Would it be sufficient to call |
@DaveCTurner I think you're 100% right, this def. another possible race. Your suggested sounds just fine too => added it in 7f5cd24 |
With the extra synchronisation, do we need the wait/notify as well? |
@DaveCTurner yea because the close callback may be executed in a different thread and hence could run after the assertion if we don't wait for the map to empty. |
Do you mean that the call to elasticsearch/server/src/main/java/org/elasticsearch/transport/ConnectionManager.java Lines 206 to 207 in cf9aff9
This doesn't look true - I can see a couple of places where we modify elasticsearch/server/src/main/java/org/elasticsearch/transport/ConnectionManager.java Line 131 in cf9aff9
elasticsearch/server/src/main/java/org/elasticsearch/transport/ConnectionManager.java Line 182 in cf9aff9
|
@DaveCTurner no this is my bad (sorry for the confusion). I mixed up the connection and the channel close. The connection close callback is invoked in the same thread that invokes. Simplified the PR in fb843f5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I'd like @tbrooks8's opinion too.
@DaveCTurner I think the two concurrency spots in elasticsearch/server/src/main/java/org/elasticsearch/transport/ConnectionManager.java Line 109 in cf9aff9
gives us the close lock for elasticsearch/server/src/main/java/org/elasticsearch/transport/ConnectionManager.java Line 131 in cf9aff9
And for: elasticsearch/server/src/main/java/org/elasticsearch/transport/ConnectionManager.java Line 182 in cf9aff9
we should be fine too. The iterator from a CHM should be consistent with concurrent removal from the CHM ( |
I think the issue might just be that the comment is misleading, as comments often are. Both of the calls to This line isn't under a lock by the way, it's in a handler that may not run until the lock is released: elasticsearch/server/src/main/java/org/elasticsearch/transport/ConnectionManager.java Line 131 in cf9aff9
|
Sorry, you're right my eyes aren't so great today apparently :D
Fixed :) |
Can someone explain this hypothesis before I review this? I don't understand how this can lead to this issue. Even if the channel is closed before the listener added, the listener is still called. The listener is just executed by the thread that is adding the listener. So I guess my point is that the additional synchronization does not seem to add much here. I'm not opposed to it as this is just test code, I just want to make sure I understand the proposed issues at play here.
YESish. The assertions introduced in the As a separate note, I want to point out that discussions about the |
@tbrooks8
At least I did not know that, sorry about the confusion here.
Yea, if you're ok with it I'd vote to keep it. It makes the whole thing a lot easier to think through (IMO).
Ah, that makes perfect sense :) Thanks! (and sorry for the annoying back and forth here, but now it's all much clearer to me :)) => I added the wait back now in ce879da |
Sure...
The issue seems to be that we can:
The failing assertion
Indeed - I wasn't tracing things very accurately. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I think we are all agreement about the potential problems here and how the PR addresses them.
LGTM
@tbrooks8 thanks! |
MockTransportService
wait30s
for close listeners to run before failing the assertion