Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Weighted Shard Routing] Fail open requests on search shard failures #5072

Merged
merged 37 commits into from
Jan 10, 2023

Conversation

anshu1106
Copy link
Contributor

@anshu1106 anshu1106 commented Nov 4, 2022

Description

Fail open shard copies request to go to the nodes in weighed away zone. This helps in reducing the number of 5xx responses as well as better shard availability. In other case, where shard search requests return 4xx due to throttling from shard copies in non-weighed away zone, we don't need to fail open. So basically if the request fails due to internal issues in the cluster we fail open for search requests.

Issues Resolved

#4735

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@anshu1106 anshu1106 requested review from a team and reta as code owners November 4, 2022 09:22
@anshu1106 anshu1106 marked this pull request as draft November 4, 2022 09:23
@anshu1106 anshu1106 changed the title [Draft] Weighted Routing fail open [Weighted Shard Routing] fail open on errors Nov 4, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2022

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@anshu1106 anshu1106 force-pushed the feature/poc-fail-open branch from d0015fb to 02f1fa8 Compare November 9, 2022 15:05
@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Anshu Agarwal added 2 commits November 10, 2022 10:16
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Copy link
Collaborator

@gbbafna gbbafna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, we will also affect the output of search_shards api which will now return all the shards including weight 0 . This might be acceptable behavior, but worth calling out.

Comment on lines 271 to 273
// This checks if the shard is present in data node with weighted routing weight set to 0,
// In such cases we fail open, if shard search request for the shard from other shard copies fail with non
// retryable exception.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A note : we should test out cross cluster search works with this as well .

Anshu Agarwal added 2 commits December 6, 2022 14:58
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Dec 6, 2022

Gradle Check (Jenkins) Run Completed with:

@anshu1106 anshu1106 marked this pull request as ready for review December 6, 2022 11:50
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@anshu1106 anshu1106 force-pushed the feature/poc-fail-open branch from 9a13bc9 to 33a04b5 Compare January 9, 2023 15:47
@@ -352,8 +354,8 @@ public ShardIterator activeInitializingShardsWeightedIt(
logger.debug("no shard copies found for shard id [{}] for node attribute with weight zero", shardId);
}
}

return new PlainShardIterator(shardId, ordered);
orderedListWithDistinctShards = new ArrayList<>(new LinkedHashSet<>(ordered));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest

ordered.stream().distinct().collect(Collectors.toList())

Comment on lines 52 to 55
public FailAwareWeightedRouting(Exception e, ClusterState clusterState) {
this.exception = e;
this.clusterState = clusterState;
}
Copy link
Collaborator

@Bukhtawar Bukhtawar Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use only cluster state in the constructor so that it could be used as a singleton instance and move exception to the method findNext and log the same exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't make FailAwareWeightedRouting singleton since even cluster state can change between requests

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah bad edit, I initially meant ClusterService for singleton but forgot to remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I have made FailAwareWeightedRouting singleton but we can't pass ClusterService in the constructor since ClusterService instance is not available in AbstractSearchAsyncAction.

}

private void logFailOpen(ShardId shardID) {
logger.info(() -> new ParameterizedMessage("{}: Fail open executed", shardID));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: will prefer this inline

Comment on lines 43 to 50
RestStatus.INTERNAL_SERVER_ERROR,
RestStatus.NOT_IMPLEMENTED,
RestStatus.BAD_GATEWAY,
RestStatus.SERVICE_UNAVAILABLE,
RestStatus.GATEWAY_TIMEOUT,
RestStatus.HTTP_VERSION_NOT_SUPPORTED,
RestStatus.INSUFFICIENT_STORAGE
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove

  1. RestStatus.NOT_IMPLEMENTED
  2. RestStatus.HTTP_VERSION_NOT_SUPPORTED
  3. RestStatus.INSUFFICIENT_STORAGE

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2023

Gradle Check (Jenkins) Run Completed with:

Anshu Agarwal added 2 commits January 9, 2023 22:21
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2023

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2023

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar
Copy link
Collaborator

Test failures : #5766

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

Comment on lines +33 to +48
public enum FailAwareWeightedRouting {
INSTANCE;

private static final Logger logger = LogManager.getLogger(FailAwareWeightedRouting.class);

private final static List<RestStatus> internalErrorRestStatusList = List.of(
RestStatus.INTERNAL_SERVER_ERROR,
RestStatus.BAD_GATEWAY,
RestStatus.SERVICE_UNAVAILABLE,
RestStatus.GATEWAY_TIMEOUT
);

public static FailAwareWeightedRouting getInstance() {
return INSTANCE;

}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the most apt way to create singletons, you could achieve it by a static property

    public static final FailAwareWeightedRouting INSTANCE = new FailAwareWeightedRouting();

while (next != null && isWeighedAway(next.currentNodeId(), clusterState)) {
ShardRouting nextShard = next;
if (canFailOpen(nextShard.shardId(), exception, clusterState)) {
logger.info(() -> new ParameterizedMessage("{}: Fail open executed due to exception {}", nextShard.shardId(), exception));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitL incorrect usage of logger with exception

@Bukhtawar Bukhtawar merged commit e3b5015 into opensearch-project:main Jan 10, 2023
anshu1106 added a commit to anshu1106/OpenSearch that referenced this pull request Jan 10, 2023
…pensearch-project#5072)

* Fail open requests on search shard failures (

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
anshu1106 added a commit to anshu1106/OpenSearch that referenced this pull request Jan 10, 2023
…pensearch-project#5072)

* Fail open requests on search shard failures (

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
anshu1106 added a commit to anshu1106/OpenSearch that referenced this pull request Jan 10, 2023
…pensearch-project#5072)

* Fail open requests on search shard failures (

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
gbbafna pushed a commit that referenced this pull request Jan 10, 2023
#5784)

* [Weighted Routing] Add support for discovered master and remove local weights in the response (#5680)

* Add support for discovered master and remove local weights in the weighted routing API response

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>

* [Weighted Shard Routing] API versioning (#5255)

* Support API versioning for weighted shard routing

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>

* [Weighted Shard Routing] Fail open requests on search shard failures (#5072)

* Fail open requests on search shard failures (

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>

* Address fail open comments (#5778)

[Weighted Shard Routing] Refactor and fix singleton in FailAwareWeightedRouting

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>

* remove unintended changes in changelog

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>

* remove unintended changes from changelog

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Co-authored-by: Anshu Agarwal <anshukag@amazon.com>
Bukhtawar pushed a commit that referenced this pull request Jan 10, 2023
…ed shard routing (#5781)

* [Backport 2.x] 
[Weighted Shard Routing] Add support for discovered master and remove local weights in the response #5680
[Weighted Shard Routing] API versioning #5255
[Weighted Shard Routing] Fail open requests on search shard failures #5072
[Weighted Shard Routing] Refactor and fix singleton in FailAwareWeightedRouting #5778

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
kotwanikunal pushed a commit that referenced this pull request Jan 25, 2023
…ed shard routing (#5781)

* [Backport 2.x] 
[Weighted Shard Routing] Add support for discovered master and remove local weights in the response #5680
[Weighted Shard Routing] API versioning #5255
[Weighted Shard Routing] Fail open requests on search shard failures #5072
[Weighted Shard Routing] Refactor and fix singleton in FailAwareWeightedRouting #5778

Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants