Fork remote-cluster response handling #97922

DaveCTurner · 2023-07-25T10:37:41Z

Today all responses to remote-cluster requests are deserialized and
handled on the transport worker thread. Some of these responses can be
sizeable, so with this commit we add the facility for callers to specify
a different executor to handle this work. It also adjusts several
callers to use more appropriate threads, including:

responses from CCR-related admin actions are handled on ccr
responses from field caps actions are handled on search_coordination

Today all responses to remote-cluster requests are deserialized and handled on the transport worker thread. Some of these responses can be sizeable, so with this commit we add the facility for callers to specify a different executor to handle this work.

elasticsearchmachine · 2023-07-25T10:38:05Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner · 2023-07-25T13:52:47Z

One thing to note here is that we are forking to pools that have bounded queues. Semantically that's ok, we force-execute these tasks (see org.elasticsearch.transport.ForkingResponseHandlerRunnable) but we may see more rejections of other work after this change in borderline-overloaded clusters. But then again that still seems preferable to spamming the transport worker threads.

Edit: also, I'm upgrading this to >bug because of how CCR requests an awful lot of index metadata when looking for new indices to follow, which in a big cluster is going to cause problems without this change.

elasticsearchmachine · 2023-07-25T15:31:09Z

Hi @DaveCTurner, I've created a changelog YAML for you.

henningandersen

Looks good. I am in doubt about one of the thread pools though.

henningandersen · 2023-07-27T07:03:29Z

server/src/main/java/org/elasticsearch/action/search/TransportSearchAction.java

@@ -466,6 +466,7 @@ static void ccsRemoteReduce(
        ActionListener<SearchResponse> listener,
        BiConsumer<SearchRequest, ActionListener<SearchResponse>> localSearchConsumer
    ) {
+        final var remoteClientResponseExecutor = threadPool.executor(ThreadPool.Names.SEARCH_COORDINATION);


I wonder if we should use the SEARCH pool instead, since that is the pool other work here is spawned on.

I think the coordination pool is mainly for the node level can-match/fieldcaps actions.

Might not matter a whole lot, but I'd rather induce any rejections due to this on the SEARCH thread pool than the SEARCH_COORDINATOR pool.

I will revert this to SAME to make progress here, and raise it with the search folks to follow-up. I think what we're doing here is more like coordination work so enqueueing it behind some IO-heavy search activity might do bad things for performance, but there's definitely tradeoffs either way.

Edit: I pushed 801500b and opened #97997

henningandersen · 2023-07-27T07:08:59Z

x-pack/plugin/ccr/src/main/java/org/elasticsearch/xpack/ccr/Ccr.java

@@ -373,10 +373,6 @@ public Optional<EngineFactory> getEngineFactory(final IndexSettings indexSetting

    @SuppressWarnings("HiddenField")
    public List<ExecutorBuilder<?>> getExecutorBuilders(Settings settings) {
-        if (enabled == false) {


Not that it matters, but was this change necessary or did you just find the "optimization" to not have the extra thread pool unnecessary?

We apparently do some CCR-related things even if CCR is disabled, or at least there are tests which now fail if the CCR threadpool's existence is contingent on whether it's enabled or not. Seemed simplest to just create the threadpool either way rather than pick all that apart.

henningandersen

LGTM.

original-brownbear

LGTM2 nice :)

DaveCTurner · 2023-07-27T09:26:26Z

Thanks both :)

Today we deserialize the chunks received when creating a follower on the CCR pool (prior to elastic#97922 this was the transport thread) and then manually fork off to the GENERIC pool to write the chunk to disk. It's simpler just to do all this on the GENERIC pool to start with, so this commit does that.

Today we deserialize the chunks received when creating a follower on the CCR pool (prior to #97922 this was the transport thread) and then manually fork off to the GENERIC pool to write the chunk to disk. It's simpler just to do all this on the GENERIC pool to start with, so this commit does that.

DaveCTurner added >non-issue :Distributed Coordination/Network Http and internode communication implementations v8.10.0 labels Jul 25, 2023

elasticsearchmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Jul 25, 2023

DaveCTurner mentioned this pull request Jul 25, 2023

Avoid transport_worker thread in TransportBroadcastByNodeAction #97920

Merged

DaveCTurner added 4 commits July 25, 2023 12:06

Test fixes

a0bce57

Fix mocking

9512228

SEARCH_COORDINATION

8b3811f

Don't fork in ShardFollowTasksExecutor

0b5aa1a

DaveCTurner requested review from original-brownbear and henningandersen July 25, 2023 13:49

DaveCTurner added >bug and removed >non-issue labels Jul 25, 2023

DaveCTurner added 2 commits July 25, 2023 16:31

Update docs/changelog/97922.yaml

c19ac7a

Merge branch 'main' into 2023/07/25/RemoteClusterClient-forking

8290408

henningandersen reviewed Jul 27, 2023

View reviewed changes

Merge branch 'main' into 2023/07/25/RemoteClusterClient-forking

738213d

DaveCTurner mentioned this pull request Jul 27, 2023

Stop handling CCS-related responses from remote cluster on transport thread #97997

Closed

Revert CCS work to transport worker for now

801500b

DaveCTurner requested a review from henningandersen July 27, 2023 08:20

henningandersen approved these changes Jul 27, 2023

View reviewed changes

original-brownbear approved these changes Jul 27, 2023

View reviewed changes

DaveCTurner merged commit f4e3113 into elastic:main Jul 27, 2023

DaveCTurner deleted the 2023/07/25/RemoteClusterClient-forking branch July 27, 2023 09:26

DaveCTurner mentioned this pull request Aug 2, 2023

Reduce forking in CCR repo #98123

Merged

DaveCTurner restored the 2023/07/25/RemoteClusterClient-forking branch June 17, 2024 06:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fork remote-cluster response handling #97922

Fork remote-cluster response handling #97922

Uh oh!

DaveCTurner commented Jul 25, 2023 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jul 25, 2023

Uh oh!

DaveCTurner commented Jul 25, 2023 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jul 25, 2023

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Jul 27, 2023

Uh oh!

DaveCTurner Jul 27, 2023 •

edited

Loading

Uh oh!

henningandersen Jul 27, 2023

Uh oh!

DaveCTurner Jul 27, 2023 •

edited

Loading

Uh oh!

henningandersen left a comment

Uh oh!

original-brownbear left a comment

Uh oh!

DaveCTurner commented Jul 27, 2023

Uh oh!

Uh oh!

Fork remote-cluster response handling #97922

Fork remote-cluster response handling #97922

Uh oh!

Conversation

DaveCTurner commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 25, 2023

Uh oh!

DaveCTurner commented Jul 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 25, 2023

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Jul 27, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen Jul 27, 2023

Choose a reason for hiding this comment

Uh oh!

DaveCTurner Jul 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

original-brownbear left a comment

Choose a reason for hiding this comment

Uh oh!

DaveCTurner commented Jul 27, 2023

Uh oh!

Uh oh!

DaveCTurner commented Jul 25, 2023 •

edited

Loading

DaveCTurner commented Jul 25, 2023 •

edited

Loading

DaveCTurner Jul 27, 2023 •

edited

Loading

DaveCTurner Jul 27, 2023 •

edited

Loading