-
Notifications
You must be signed in to change notification settings - Fork 25.3k
Release operation permit on thread-pool rejection #25930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release operation permit on thread-pool rejection #25930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM in sense that it does the right thing while maintaining the old behavior of the generic ThreadedActionListener. However, in this case, I'm a bit concerned about the number of options in which the failure handler can be called - sometimes on the requested executor, sometimes on the generic thread and some times on the original thread - so there is really no guarantee on what's "safe" to do there. I wonder if we should make it simpler code wise and say that the executor on holds for onResponse. The failure handler can be called on arbitrary threads and should only be used for light weight work (which we do). WDYT?
|
||
@Override | ||
public void onFailure(Exception e) { | ||
listener.onFailure(e); // will possibly execute on the generic thread spawned off in releaseDelayedOperations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we release the releasable just in case the exception comes from the listener? this would allow us to only implement on failure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is the responsibility of the wrapped listener to make sure to free the permit when encountering an exception in its onResponse
method. I prefer the code to be more explicit here instead of mixing two failures paths.
|
||
@Override | ||
public void onFailure(Exception e) { | ||
listener.onFailure(e); // will possibly execute on the generic thread spawned off in releaseDelayedOperations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of rejection, I think we want to add the original exception as "suppressed"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment becomes obsolete by running the PermitAwareThreadedActionListener.onFailure()
method on the caller thread.
@@ -59,7 +63,18 @@ | |||
|
|||
@BeforeClass | |||
public static void setupThreadPool() { | |||
threadPool = new TestThreadPool("IndexShardOperationsLockTests"); | |||
int bulkThreadPoolSize = randomIntBetween(1, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ to randomly use limited capacity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @ywelsch . My only nit is to add java docs for the required "lightness" of the onFailure handling in the onAcquired
listener.
Thanks @bleskes |
At the shard level we use an operation permit to coordinate between regular shard operations and special operations that need exclusive access. In ES versions < 6, the operation requiring exclusive access was invoked during primary relocation, but ES versions >= 6 this exclusive access is also used when a replica learns about a new primary or when a replica is promoted to primary. These special operations requiring exclusive access delay regular operations from running, by adding them to a queue, and after finishing the exclusive access, release these operations which then need to be put back on the original thread-pool they were running on. In the presence of thread pool rejections, the current implementation had two issues: - it would not properly release the operation permit when hitting a rejection (i.e. when calling ThreadedActionListener.onResponse from IndexShardOperationPermits.acquire). - it would not invoke the onFailure method of the action listener when the shard was closed, and just log a warning instead (see ThreadedActionListener.onFailure), which would ultimately lead to the replication task never being cleaned up (see #25863). This commit fixes both issues by introducing a custom threaded action listener that is permit-aware and properly deals with rejections. Closes #25863
At the shard level we use an operation permit to coordinate between regular shard operations and special operations that need exclusive access. In ES versions < 6, the operation requiring exclusive access was invoked during primary relocation, but ES versions >= 6 this exclusive access is also used when a replica learns about a new primary or when a replica is promoted to primary. These special operations requiring exclusive access delay regular operations from running, by adding them to a queue, and after finishing the exclusive access, release these operations which then need to be put back on the original thread-pool they were running on. In the presence of thread pool rejections, the current implementation had two issues: - it would not properly release the operation permit when hitting a rejection (i.e. when calling ThreadedActionListener.onResponse from IndexShardOperationPermits.acquire). - it would not invoke the onFailure method of the action listener when the shard was closed, and just log a warning instead (see ThreadedActionListener.onFailure), which would ultimately lead to the replication task never being cleaned up (see #25863). This commit fixes both issues by introducing a custom threaded action listener that is permit-aware and properly deals with rejections. Closes #25863
At the shard level we use an operation permit to coordinate between regular shard operations and special operations that need exclusive access. In ES versions < 6, the operation requiring exclusive access was invoked during primary relocation, but ES versions >= 6 this exclusive access is also used when a replica learns about a new primary or when a replica is promoted to primary. These special operations requiring exclusive access delay regular operations from running, by adding them to a queue, and after finishing the exclusive access, release these operations which then need to be put back on the original thread-pool they were running on. In the presence of thread pool rejections, the current implementation had two issues: - it would not properly release the operation permit when hitting a rejection (i.e. when calling ThreadedActionListener.onResponse from IndexShardOperationPermits.acquire). - it would not invoke the onFailure method of the action listener when the shard was closed, and just log a warning instead (see ThreadedActionListener.onFailure), which would ultimately lead to the replication task never being cleaned up (see #25863). This commit fixes both issues by introducing a custom threaded action listener that is permit-aware and properly deals with rejections. Closes #25863
At the shard level we use an operation permit to coordinate between regular shard operations and special operations that need exclusive access. In ES versions < 6, the operation requiring exclusive access was invoked during primary relocation, but ES versions >= 6 this exclusive access is also used when a replica learns about a new primary or when a replica is promoted to primary. These special operations requiring exclusive access delay regular operations from running, by adding them to a queue, and after finishing the exclusive access, release these operations which then need to be put back on the original thread-pool they were running on. In the presence of thread pool rejections, the current implementation had two issues: - it would not properly release the operation permit when hitting a rejection (i.e. when calling ThreadedActionListener.onResponse from IndexShardOperationPermits.acquire). - it would not invoke the onFailure method of the action listener when the shard was closed, and just log a warning instead (see ThreadedActionListener.onFailure), which would ultimately lead to the replication task never being cleaned up (see #25863). This commit fixes both issues by introducing a custom threaded action listener that is permit-aware and properly deals with rejections. Closes #25863
At the shard level we use an operation permit to coordinate between regular shard operations and special operations that need exclusive access. In ES versions < 6, the operation requiring exclusive access was invoked during primary relocation, but ES versions >= 6 this exclusive access is also used when a replica learns about a new primary or when a replica is promoted to primary.
These special operations requiring exclusive access delay regular operations from running, by adding them to a queue, and after finishing the exclusive access, release these operations which then need to be put back on the original thread-pool they were running on. In the presence of thread pool rejections, the current implementation had two issues:
ThreadedActionListener.onResponse
fromIndexShardOperationPermits.acquire
).ThreadedActionListener.onFailure
), which would ultimately lead to the replication task never being cleaned up (see Tasks stuck forever #25863).This PR fixes both issues by introducing a custom threaded action listener that is permit-aware and properly deals with rejections.
Closes #25863