You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In Cortex, given a query, we're running a query plan in the querier to figure out which blocks should be queried and then we fetch the Series() from all store-gateway instances holding the affected blocks (a store-gateway runs the Thanos BucketStore).
Due to how the blocks sharding and replication works, there're real cases where we end up querying the same block from multiple store-gateway instances. For example if SG1 contains blocks A,B,C and SG2 contains blocks B,C,D and we have to query blocks C,D the querier fetches Series() both from SG1 and SG2. The block D will be queried only on SG2, but block C will be queried both from SG1 and SG2. This leads to a waste of computational resources.
Given we run a plan in the querier, we would like to be able to tell BucketStore which blocks to query, in order to efficiently distribute the workload across nodes and not fetch series from the same block twice (or even 3 times, given we run with a blocks replication factor = 3) for the same query.
Like we did in #2479, I would like to propose to introduce request hints:
message SeriesRequest {
// [already existing fields]
/// hints is an opaque data structure that can be used to carry additional information.
/// The content of this field and whether it's supported depends on the
/// implementation of a specific store.
google.protobuf.Any hints = 9;
}
Like we did for the response, hints would be an opaque structure whose content depends on the specific store. The BucketStore would support these hints:
message SeriesRequestHints {
/// filter_blocks is the list of blocks that should be queried. Any other block loaded in the
/// store can be skipped. If the list is empty, no filtering is applied.
repeated Block filter_blocks = 1 [(gogoproto.nullable) = false];
}
The required changes in BucketStore.Series() would be minimal (few lines of code) and shouldn't make the BucketStore.Series() implementation harder to maintain over the time.
The text was updated successfully, but these errors were encountered:
In Cortex, given a query, we're running a query plan in the querier to figure out which blocks should be queried and then we fetch the
Series()
from all store-gateway instances holding the affected blocks (a store-gateway runs the ThanosBucketStore
).Due to how the blocks sharding and replication works, there're real cases where we end up querying the same block from multiple store-gateway instances. For example if SG1 contains blocks A,B,C and SG2 contains blocks B,C,D and we have to query blocks C,D the querier fetches
Series()
both from SG1 and SG2. The block D will be queried only on SG2, but block C will be queried both from SG1 and SG2. This leads to a waste of computational resources.Given we run a plan in the querier, we would like to be able to tell
BucketStore
which blocks to query, in order to efficiently distribute the workload across nodes and not fetch series from the same block twice (or even 3 times, given we run with a blocks replication factor = 3) for the same query.Like we did in #2479, I would like to propose to introduce request hints:
Like we did for the response,
hints
would be an opaque structure whose content depends on the specific store. TheBucketStore
would support these hints:The required changes in
BucketStore.Series()
would be minimal (few lines of code) and shouldn't make theBucketStore.Series()
implementation harder to maintain over the time.The text was updated successfully, but these errors were encountered: