-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Fix assertion error on multiple node search reduce errors #131085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix assertion error on multiple node search reduce errors #131085
Conversation
When receiving exceptions from more than one node on data node search reduction, SearchQueryThenFetchAsyncAction will raise more than one phase failure. This can lead to calling the listener in AbstractSearchAsyncAction more than once, which in turn in tests trips an assertion in ActionListener#assertFirstRun.
Pinging @elastic/es-search-foundations (Team:Search Foundations) |
I ran into this issue while working on #129445 where another test in this suite fails for probably other reasons ("testSortMixedFieldTypes"). The test provokes IllegalArgumentExceptions when sorting on two indices with incompatible sort types.
Without the change in
I'm not sure preventing propagating any subsequent phase failure in AbstractSearchAsyncAction like suggested here is the right way to solve this, also I'm not really sure what big of an issue calling the ActionListener twice is in practice but the test assertion seems to be there for a reason. Also this might still be related to the weird errors we get in #129445 since that issue also seems to be related to search exception handling. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it seems wrong to raise multiple phase failures for multiple data node failures.
I'm hesitant about this change, is it possible we're relying on multiple runs of raisePhaseFailure
/ listener.onFailure(exception)
here, one per node search exception? Would love to hear more of your thoughts on that.
@@ -621,6 +622,10 @@ protected BytesReference buildSearchContextId(ShardSearchFailure[] failures) { | |||
* @param cause the cause of the phase failure | |||
*/ | |||
public void onPhaseFailure(String phase, String msg, Throwable cause) { | |||
if (phaseFailureEncountered.compareAndSet(false, true) == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is what we want, but this change would be lighter weight if this check was moved to the caller in SearchQueryThenFetchAsyncAction.
When receiving exceptions from more than one node on data node search reduction, SearchQueryThenFetchAsyncAction will raise more than one phase failure. This can lead to calling the listener in AbstractSearchAsyncAction more than once, which in turn in tests trips an assertion in ActionListener#assertFirstRun.