Open
Description
Cortex only retry fetching a block from a store gateway upon error, see:
cortex/pkg/querier/blocks_store_queryable.go
Lines 604 to 609 in dd4240d
cortex/pkg/querier/blocks_store_queryable.go
Line 503 in dd4240d
This means that is a single store gateway is just slow and not return an error, the query will eventually timeout.
This scenario can happens for multiple reasons like network partition between store gateway and the storage or a slow disk.
On those cases we could:
- Try to fetch at least 2 store-gateways in parallel, or
- Have some mechanism to make store-gateway advertise that he cannot handle requests (set itself to unhealthy?)