Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCS: don't proxy requests for already connected node #31273

Merged
merged 4 commits into from
Jun 13, 2018

Conversation

javanna
Copy link
Member

@javanna javanna commented Jun 12, 2018

Cross-cluster search selects a subset of nodes for each remote cluster
and sends requests only to them, which will act as a proxy and properly
redirect such requests to the target nodes that hold the relevant data.
What happens today is that every time we send a request to a remote
cluster, it will be sent to the next node in the proxy list
(in round-robin fashion), regardless of whether the target node is
already amongst the ones that we are connected to. In case for instance
we need to send a shard search request to a data node that's also one of
the selected proxy nodes, we may end up sending the request to it
through one of the other proxy nodes.
This commit optimizes this case to make sure that whenever we are
already connected to a remote node, we will send a direct request rather
than using the next proxy node.
There is a side-effect to this, which is that round-robin will be a bit
unbalanced as the data nodes that are also selected as proxies will
receive more requests.

javanna added 2 commits June 11, 2018 17:46
Cross-cluster search selects a subset of nodes for each remote cluster
and sends requests only to them, which will act as a proxy and properly
redirect such requests to the target nodes that hold the relevant data.
What happens today is that every time we send a request to a remote
cluster, it will be sent to the next node in the proxy list
(in round-robin fashion), regardless of whether the target node is
already amongst the ones that we are connected to. In case for instance
we need to send a shard search request to a data node that's also one of
the selected proxy nodes, we may end up sending the request to it
through one of the other proxy nodes.
This commit optimizes this case to make sure that whenever we are
already connected to a remote node, we will send a direct request rather
than using the next proxy node.
There is a side-effect to this, which is that round-robin will be a bit
unbalanced as the data nodes that are also selected as proxies will
receive more requests.
@javanna javanna added >enhancement :Search/Search Search-related issues that do not fall into other categories v7.0.0 v6.4.0 labels Jun 12, 2018
@javanna javanna requested a review from s1monw June 12, 2018 13:41
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@s1monw s1monw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some suggestions LGTM

*/
Transport.Connection getConnection(DiscoveryNode remoteClusterNode) {
DiscoveryNode discoveryNode = connectedNodes.get();
if (connectedNodes.contains(remoteClusterNode)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use transportService .nodeConnected instead. I mean if we for instance have the local cluster configured as a remote cluster we get this optimization as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, great catch!


@Override
public void sendRequest(long requestId, String action, TransportRequest request, TransportRequestOptions options)
static class ProxyConnection implements Transport.Connection {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the class final?

@@ -612,7 +622,7 @@ int getNumNodesConnected() {
return connectedNodes.size();
}

private static class ConnectedNodes implements Supplier<DiscoveryNode> {
private static class ConnectedNodes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also final?

@javanna javanna merged commit 664903a into elastic:master Jun 13, 2018
jasontedor added a commit to nik9000/elasticsearch that referenced this pull request Jun 14, 2018
* elastic/master: (40 commits)
  [DOC] Extend SQL docs
  Immediately flush channel after writing to buffer (elastic#31301)
  [DOCS] Shortens ML API intros
  Use quotes in the call invocation (elastic#31249)
  move security ingest processors to a sub ingest directory (elastic#31306)
  Add 5.6.11 version constant.
  Fix version detection.
  SQL: Whitelist SQL utility class for better scripting (elastic#30681)
  [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (elastic#31299)
  CCS: don't proxy requests for already connected node (elastic#31273)
  Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap
  [test] opensuse packaging turn up debug logging
  Add unreleased version 6.3.1
  Removes experimental tag from scripted_metric aggregation (elastic#31298)
  [Rollup] Metric config parser must use builder so validation runs (elastic#31159)
  [ML] Check licence when datafeeds use cross cluster search  (elastic#31247)
  Add notion of internal index settings (elastic#31286)
  Test: Remove broken yml test feature (elastic#31255)
  REST hl client: cluster health to default to cluster level (elastic#31268)
  [ML] Update test thresholds to account for changes to memory control (elastic#31289)
  ...
jasontedor added a commit to majormoses/elasticsearch that referenced this pull request Jun 14, 2018
* elastic/master: (29 commits)
  [DOC] Extend SQL docs
  Immediately flush channel after writing to buffer (elastic#31301)
  [DOCS] Shortens ML API intros
  Use quotes in the call invocation (elastic#31249)
  move security ingest processors to a sub ingest directory (elastic#31306)
  Add 5.6.11 version constant.
  Fix version detection.
  SQL: Whitelist SQL utility class for better scripting (elastic#30681)
  [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (elastic#31299)
  CCS: don't proxy requests for already connected node (elastic#31273)
  Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap
  [test] opensuse packaging turn up debug logging
  Add unreleased version 6.3.1
  Removes experimental tag from scripted_metric aggregation (elastic#31298)
  [Rollup] Metric config parser must use builder so validation runs (elastic#31159)
  [ML] Check licence when datafeeds use cross cluster search  (elastic#31247)
  Add notion of internal index settings (elastic#31286)
  Test: Remove broken yml test feature (elastic#31255)
  REST hl client: cluster health to default to cluster level (elastic#31268)
  [ML] Update test thresholds to account for changes to memory control (elastic#31289)
  ...
dnhatn added a commit that referenced this pull request Jun 14, 2018
* master:
  Remove RestGetAllAliasesAction (#31308)
  Temporary fix for broken build
  Reenable Checkstyle's unused import rule (#31270)
  Remove remaining unused imports before merging #31270
  Fix non-REST doc snippet
  [DOC] Extend SQL docs
  Immediately flush channel after writing to buffer (#31301)
  [DOCS] Shortens ML API intros
  Use quotes in the call invocation (#31249)
  move security ingest processors to a sub ingest directory (#31306)
  Add 5.6.11 version constant.
  Fix version detection.
  SQL: Whitelist SQL utility class for better scripting (#30681)
  [Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299)
  CCS: don't proxy requests for already connected node (#31273)
  Mute ScriptedMetricAggregatorTests testSelfReferencingAggStateAfterMap
  [test] opensuse packaging turn up debug logging
  Add unreleased version 6.3.1
  Removes experimental tag from scripted_metric aggregation (#31298)
  [Rollup] Metric config parser must use builder so validation runs (#31159)
  [ML] Check licence when datafeeds use cross cluster search  (#31247)
  Add notion of internal index settings (#31286)
  Test: Remove broken yml test feature (#31255)
  REST hl client: cluster health to default to cluster level (#31268)
  [ML] Update test thresholds to account for changes to memory control (#31289)
  Log warnings when cluster state publication failed to some nodes (#31233)
  Fix AntFixture waiting condition (#31272)
  Ignore numeric shard count if waiting for ALL (#31265)
  [ML] Implement new rules design (#31110)
  index_prefixes back-compat should test 6.3 (#30951)
  Core: Remove plain execute method on TransportAction (#30998)
  Update checkstyle to 8.10.1 (#31269)
  Set analyzer version in PreBuiltAnalyzerProviderFactory (#31202)
  Modify pipelining handlers to require full requests (#31280)
  Revert upgrade to Netty 4.1.25.Final (#31282)
  Use armored input stream for reading public key (#31229)
  Fix Netty 4 Server Transport tests. Again.
  REST hl client: adjust wait_for_active_shards param in cluster health (#31266)
  REST high-level Client: remove deprecated API methods (#31200)
  [DOCS] Mark SQL feature as experimental
  [DOCS] Updates machine learning custom URL screenshots (#31222)
  Fix naming conventions check for XPackTestCase
  Fix security Netty 4 transport tests
  Fix race in clear scroll (#31259)
  [DOCS] Clarify audit index settings when remote indexing (#30923)
  Delete typos in SAML docs (#31199)
  REST high-level client: add Cluster Health API (#29331)
  [ML][TEST] Mute tests using rules (#31204)
  Support RequestedAuthnContext (#31238)
  SyncedFlushResponse to implement ToXContentObject (#31155)
  Add Get Aliases API to the high-level REST client (#28799)
  Remove some line length supressions (#31209)
  Validate xContentType in PutWatchRequest. (#31088)
  [INGEST] Interrupt the current thread if evaluation grok expressions take too long (#31024)
  Suppress extras FS on caching directory tests
  Revert "[DOCS] Added 6.3 info & updated the upgrade table. (#30940)"
  Revert "Fix snippets in upgrade docs"
  Fix snippets in upgrade docs
  [DOCS] Added 6.3 info & updated the upgrade table. (#30940)
  LLClient: Support host selection (#30523)
  Upgrade to Netty 4.1.25.Final (#31232)
  Enable custom credentials for core REST tests (#31235)
  Move ESIndexLevelReplicationTestCase to test framework (#31243)
  Encapsulate Translog in Engine (#31220)
  HLRest: Add get index templates API (#31161)
  Remove all unused imports and fix CRLF (#31207)
  [Tests] Fix self-referencing tests
  [TEST] Fix testRecoveryAfterPrimaryPromotion
  [Docs] Remove mention pattern files in Grok processor (#31170)
  Use stronger write-once semantics for Azure repository (#30437)
  Don't swallow exceptions on replication (#31179)
  Limit the number of concurrent requests per node (#31206)
  Call ensureNoSelfReferences() on _agg state variable after scripted metric agg script executions (#31044)
  Move java version checker back to its own jar (#30708)
  [test] add fix for rare virtualbox error (#31212)
javanna added a commit that referenced this pull request Jun 15, 2018
Cross-cluster search selects a subset of nodes for each remote cluster
and sends requests only to them, which will act as a proxy and properly
redirect such requests to the target nodes that hold the relevant data.
What happens today is that every time we send a request to a remote
cluster, it will be sent to the next node in the proxy list
(in round-robin fashion), regardless of whether the target node is
already amongst the ones that we are connected to. In case for instance
we need to send a shard search request to a data node that's also one of
the selected proxy nodes, we may end up sending the request to it
through one of the other proxy nodes.
This commit optimizes this case to make sure that whenever we are
already connected to a remote node, we will send a direct request rather
than using the next proxy node.
There is a side-effect to this, which is that round-robin will be a bit
unbalanced as the data nodes that are also selected as proxies will
receive more requests.
dnhatn added a commit that referenced this pull request Jun 15, 2018
* 6.x:
  Upgrade to Lucene-7.4.0-snapshot-518d303506 (#31360)
  [ML] Implement new rules design (#31110) (#31294)
  Remove RestGetAllAliasesAction (#31308)
  CCS: don't proxy requests for already connected node (#31273)
  Rankeval: Fold template test project into main module (#31203)
  [Docs] Remove reference to repository-s3 plugin creating an S3 bucket (#31359)
  More detailed tracing when writing metadata (#31319)
  Add details section for dcg ranking metric (#31177)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants