-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Weighted Shard Routing] Fail open requests on search shard failures #5072
[Weighted Shard Routing] Fail open requests on search shard failures #5072
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
d0015fb
to
02f1fa8
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, we will also affect the output of search_shards
api which will now return all the shards including weight 0 . This might be acceptable behavior, but worth calling out.
server/src/main/java/org/opensearch/cluster/routing/IndexShardRoutingTable.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/opensearch/cluster/routing/WeightedRoutingHelper.java
Outdated
Show resolved
Hide resolved
// This checks if the shard is present in data node with weighted routing weight set to 0, | ||
// In such cases we fail open, if shard search request for the shard from other shard copies fail with non | ||
// retryable exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A note : we should test out cross cluster search works with this as well .
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
9a13bc9
to
33a04b5
Compare
@@ -352,8 +354,8 @@ public ShardIterator activeInitializingShardsWeightedIt( | |||
logger.debug("no shard copies found for shard id [{}] for node attribute with weight zero", shardId); | |||
} | |||
} | |||
|
|||
return new PlainShardIterator(shardId, ordered); | |||
orderedListWithDistinctShards = new ArrayList<>(new LinkedHashSet<>(ordered)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest
ordered.stream().distinct().collect(Collectors.toList())
public FailAwareWeightedRouting(Exception e, ClusterState clusterState) { | ||
this.exception = e; | ||
this.clusterState = clusterState; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use only cluster state in the constructor so that it could be used as a singleton instance and move exception to the method findNext
and log the same exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can't make FailAwareWeightedRouting
singleton since even cluster state can change between requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah bad edit, I initially meant ClusterService for singleton but forgot to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I have made FailAwareWeightedRouting
singleton but we can't pass ClusterService
in the constructor since ClusterService
instance is not available in AbstractSearchAsyncAction
.
} | ||
|
||
private void logFailOpen(ShardId shardID) { | ||
logger.info(() -> new ParameterizedMessage("{}: Fail open executed", shardID)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: will prefer this inline
RestStatus.INTERNAL_SERVER_ERROR, | ||
RestStatus.NOT_IMPLEMENTED, | ||
RestStatus.BAD_GATEWAY, | ||
RestStatus.SERVICE_UNAVAILABLE, | ||
RestStatus.GATEWAY_TIMEOUT, | ||
RestStatus.HTTP_VERSION_NOT_SUPPORTED, | ||
RestStatus.INSUFFICIENT_STORAGE | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
- RestStatus.NOT_IMPLEMENTED
- RestStatus.HTTP_VERSION_NOT_SUPPORTED
- RestStatus.INSUFFICIENT_STORAGE
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Test failures : #5766 |
Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Gradle Check (Jenkins) Run Completed with:
|
public enum FailAwareWeightedRouting { | ||
INSTANCE; | ||
|
||
private static final Logger logger = LogManager.getLogger(FailAwareWeightedRouting.class); | ||
|
||
private final static List<RestStatus> internalErrorRestStatusList = List.of( | ||
RestStatus.INTERNAL_SERVER_ERROR, | ||
RestStatus.BAD_GATEWAY, | ||
RestStatus.SERVICE_UNAVAILABLE, | ||
RestStatus.GATEWAY_TIMEOUT | ||
); | ||
|
||
public static FailAwareWeightedRouting getInstance() { | ||
return INSTANCE; | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is the most apt way to create singletons, you could achieve it by a static property
public static final FailAwareWeightedRouting INSTANCE = new FailAwareWeightedRouting();
while (next != null && isWeighedAway(next.currentNodeId(), clusterState)) { | ||
ShardRouting nextShard = next; | ||
if (canFailOpen(nextShard.shardId(), exception, clusterState)) { | ||
logger.info(() -> new ParameterizedMessage("{}: Fail open executed due to exception {}", nextShard.shardId(), exception)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitL incorrect usage of logger with exception
…pensearch-project#5072) * Fail open requests on search shard failures ( Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
…pensearch-project#5072) * Fail open requests on search shard failures ( Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
…pensearch-project#5072) * Fail open requests on search shard failures ( Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
#5784) * [Weighted Routing] Add support for discovered master and remove local weights in the response (#5680) * Add support for discovered master and remove local weights in the weighted routing API response Signed-off-by: Anshu Agarwal <anshukag@amazon.com> * [Weighted Shard Routing] API versioning (#5255) * Support API versioning for weighted shard routing Signed-off-by: Anshu Agarwal <anshukag@amazon.com> * [Weighted Shard Routing] Fail open requests on search shard failures (#5072) * Fail open requests on search shard failures ( Signed-off-by: Anshu Agarwal <anshukag@amazon.com> * Address fail open comments (#5778) [Weighted Shard Routing] Refactor and fix singleton in FailAwareWeightedRouting Signed-off-by: Anshu Agarwal <anshukag@amazon.com> * remove unintended changes in changelog Signed-off-by: Anshu Agarwal <anshukag@amazon.com> * remove unintended changes from changelog Signed-off-by: Anshu Agarwal <anshukag@amazon.com> Signed-off-by: Anshu Agarwal <anshukag@amazon.com> Co-authored-by: Anshu Agarwal <anshukag@amazon.com>
…ed shard routing (#5781) * [Backport 2.x] [Weighted Shard Routing] Add support for discovered master and remove local weights in the response #5680 [Weighted Shard Routing] API versioning #5255 [Weighted Shard Routing] Fail open requests on search shard failures #5072 [Weighted Shard Routing] Refactor and fix singleton in FailAwareWeightedRouting #5778 Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
…ed shard routing (#5781) * [Backport 2.x] [Weighted Shard Routing] Add support for discovered master and remove local weights in the response #5680 [Weighted Shard Routing] API versioning #5255 [Weighted Shard Routing] Fail open requests on search shard failures #5072 [Weighted Shard Routing] Refactor and fix singleton in FailAwareWeightedRouting #5778 Signed-off-by: Anshu Agarwal <anshukag@amazon.com>
Description
Fail open shard copies request to go to the nodes in weighed away zone. This helps in reducing the number of 5xx responses as well as better shard availability. In other case, where shard search requests return 4xx due to throttling from shard copies in non-weighed away zone, we don't need to fail open. So basically if the request fails due to internal issues in the cluster we fail open for search requests.
Issues Resolved
#4735
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.