-
Couldn't load subscription status.
- Fork 2.3k
Description
Describe the bug
API _cluster/allocation/explain is returning incorrect response on clusters with batch mode enabled because the request for shard explain allocation are being served by GatewayAllocator instead of ShardsBatchGatewayAllocator.(AllocatorFetchLogic, ExistingShardAllocatorSetting). A change in AllocationService is required to switch to the ShardsBatchGatewayAllocator when batch mode is enabled.
Issue was identified by:
Enabling index.unassigned.node_left.delayed_timeout and taking down nodes with 2 replicas of the shard, the expected response from _cluster/allocation/explain was allocation_delayed whereas the API returned awaiting_info instead.
Related component
Cluster Manager
To Reproduce
- Create a cluster with dedicated master and 10 data nodes.
- Create a test index with 2 primary and 3 replica
curl -X PUT "localhost:9200/test-ind?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index": {
"number_of_shards": 2,
"number_of_replicas": 3
}
}
}'
- Enable the unassigned delayed_timeout setting
4. curl -X PUT "localhost:9200/_all/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"settings": {
"index.unassigned.node_left.delayed_timeout": "10m"
}
}
- Get the nodes with shards for the index
curl localhost:9200/_cat/shards/test-ind
- Stop ES process on 2 data nodes with the replicas for shard0
- Get allocation response for the shard
curl -XGET 'http://localhost:9200/_cluster/allocation/explain' -H 'Content-Type: application/json' -d '{
"index": "test-ind",
"shard": 0,
"primary": false
}'
- Validate value for can_allocate field in response is awaiting_info, response would look like this:
{"index":"test-ind","shard":0,"primary":false,"current_state":"unassigned","unassigned_info":{"reason":"NODE_LEFT","at":"2024-06-05T05:33:16.753Z","details":"node_left [Bvu-mf5XSPu3DEmv9ndBgw]","last_allocation_status":"no_attempt"},"can_allocate":"awaiting_info","allocate_explanation":"cannot allocate because information about existing shard data is still being retrieved from some of the nodes","node_allocation_decisions":[{"node_id":"3YYYQYZLQaGck1tIOJ57xg","node_name":"517c7e06d65968c38f1a4140b265ccc4","
Expected behavior
Value for can_allocate field in response is delayed_timeout
Additional Details
OpenSearch Version: 2.14
Metadata
Metadata
Assignees
Labels
Type
Projects
Status