Closed
Description
Describe the bug
With zone aware replication and with RF > number of zones, it's possible that the three retries for the same block can go to replicas in the same zone.
This can be problematic during:
- An AZ outage - even if we have replicas in healthy AZs, we might never query them.
- A Zone based deployment - Multiple store-gateways from the same zone can be brought down and the 3 retries might all hit the store-gateways that are down.
Assume we have 9 replicas for a block:
- sg1 (AZ1)
- sg2 (AZ1)
- sg3 (AZ1)
- sg4 (AZ2)
- sg5 (AZ2)
- sg6 (AZ2)
- sg7 (AZ3)
- sg8 (AZ3)
- sg9 (AZ3)
Assume AZ1 is down and sg1, sg2 and sg3 are not available.
The retry logic picks a random store-gateway from the list and it's possible that all three retries go to the store-gateways in AZ1.
Relavant Code:
- Retry Logic: https://github.com/cortexproject/cortex/blob/master/pkg/querier/blocks_store_queryable.go#L505
- Picking random store-gateways: https://github.com/cortexproject/cortex/blob/master/pkg/querier/blocks_store_replicated_set.go#L147
To Reproduce
Steps to reproduce the behavior:
- Enable zone aware replication
- Set RF to 9
- Bring down multiple store-gateway in the same AZ.
Expected behavior
- An AZ outage shouldn't fail a query if there are replicas in other AZs.