Skip to content

React more prompty to task cancellation while waiting for the cluster to unblock #128737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 3, 2025

Conversation

nielsbauman
Copy link
Contributor

Instead of waiting for the next run of the ClusterStateObserver (which might be arbitrarily far in the future, but bound by the timeout if one is set), we notify the listener immediately that the task has been cancelled. While doing so, we ensure we invoke the listener only once.

Fixes #117971

… to unblock

Instead of waiting for the next run of the `ClusterStateObserver` (which
might be arbitrarily far in the future, but bound by the timeout if one
is set), we notify the listener immediately that the task has been
cancelled. While doing so, we ensure we invoke the listener only once.

Fixes elastic#117971
@nielsbauman nielsbauman requested a review from DaveCTurner June 2, 2025 08:57
@nielsbauman nielsbauman added >enhancement :Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0 labels Jun 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @nielsbauman, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a naming/comment nit.

@@ -104,20 +106,40 @@ private void waitForClusterUnblock(
logger,
clusterService.threadPool().getThreadContext()
);
// We track whether we already notified the listener of cancellation, to avoid invoking the listener twice.
final var notifiedCancellation = new AtomicBoolean(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is a bit odd, this gets set to true even if we haven't notified the listener of cancellation. Maybe waitComplete?

I was going to suggest using org.elasticsearch.action.ActionListener#notifyOnce but it's more subtle than that: we want to suppress the cancellation listener before calling innerDoExecute. I think that deserves a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the variable to notifiedListener. Even though we don't notify the listener directly in onNewClusterState, we do in innerDoExecute. I also updated the comment. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that's still going to be confusing to some future reader. The flag doesn't indicate we've completed the listener, it indicates that the wait for an appropriate cluster state in TransportLocalClusterStateAction is over and we've started to execute the action, so I think waitComplete would be a better name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, renamed to waitComplete in 3c6aafa.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but also consider extracting a utility for this pattern to simplify the compareAndSet things for future readers. ... (false, true) == false is pretty hard to parse. Something like this perhaps?

diff --git a/libs/core/src/main/java/org/elasticsearch/core/Predicates.java b/libs/core/src/main/java/org/elasticsearch/core/Predicates.java
index bd8c1517323..88c4f138967 100644
--- a/libs/core/src/main/java/org/elasticsearch/core/Predicates.java
+++ b/libs/core/src/main/java/org/elasticsearch/core/Predicates.java
@@ -9,6 +9,8 @@

 package org.elasticsearch.core;

+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.function.BooleanSupplier;
 import java.util.function.Predicate;

 /**
@@ -90,4 +92,22 @@ public enum Predicates {
     public static <T> Predicate<T> never() {
         return (Predicate<T>) NEVER;
     }
+
+    private static class OnceTrue extends AtomicBoolean implements BooleanSupplier {
+        OnceTrue() {
+            super(true);
+        }
+
+        @Override
+        public boolean getAsBoolean() {
+            return getAndSet(false);
+        }
+    }
+
+    /**
+     * @return a {@link BooleanSupplier} which supplies {@code true} the first time it is called, and {@code false} subsequently.
+     */
+    public static BooleanSupplier once() {
+        return new OnceTrue();
+    }
 }

@nielsbauman nielsbauman requested a review from a team as a code owner June 3, 2025 06:55
@nielsbauman
Copy link
Contributor Author

Great suggestion, thanks! Applied in 0925808.

@nielsbauman nielsbauman enabled auto-merge (squash) June 3, 2025 06:55
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM still

@nielsbauman nielsbauman merged commit f988611 into elastic:main Jun 3, 2025
17 of 18 checks passed
@nielsbauman nielsbauman deleted the task-cancellation branch June 3, 2025 08:00
joshua-adams-1 pushed a commit to joshua-adams-1/elasticsearch that referenced this pull request Jun 3, 2025
… to unblock (elastic#128737)

Instead of waiting for the next run of the `ClusterStateObserver` (which
might be arbitrarily far in the future, but bound by the timeout if one
is set), we notify the listener immediately that the task has been
cancelled. While doing so, we ensure we invoke the listener only once.

Fixes elastic#117971
Samiul-TheSoccerFan pushed a commit to Samiul-TheSoccerFan/elasticsearch that referenced this pull request Jun 5, 2025
… to unblock (elastic#128737)

Instead of waiting for the next run of the `ClusterStateObserver` (which
might be arbitrarily far in the future, but bound by the timeout if one
is set), we notify the listener immediately that the task has been
cancelled. While doing so, we ensure we invoke the listener only once.

Fixes elastic#117971
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Task Management Issues for anything around the Tasks API - both persistent and node level. >enhancement Team:Distributed Coordination Meta label for Distributed Coordination team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

React more promptly to task cancellation while waiting for the cluster to unblock
3 participants