Shared caching layer for thanos queriers #5047

yeya24 · 2022-01-09T02:14:18Z

Is your proposal related to a problem?

We are running about 10 Thanos querier replicas for scaling purposes and we have 100+ sidecar + prometheus edge clusters across the world.

For our setup, the fanout problem is huge because of the scale. For example:

Info requests to sidecars

This is not a big problem because Info Request and Response are relatively cheap. In our setup, (number of queriers x number of sidecars) requests are sent every time. It is okay when scale is small. However, when you have more and more Thanos Queriers and edge sidecars, this is not very efficient.

metadata and rules query requests to sidecars

metrics metadata and rules query is something hardly changed for us. Especially metrics metadata. This is where caching would benefit us a lot.

more use case in the future

From #1611, we proposed to have some bloom filter like datastructure for reducing unnecessary series calls. Ideally, this could be done by introducing more data reported from the Info API and keep a bloom filter in queriers. If we can have a caching layer for the querier clusters then keeping the bloom filter up-to-date is not that expensive anymore.

Describe the solution you'd like

Have another type of cache for this use case. Maybe call it proxy cache? It is similar to caching bucket but this time we cache endpoint responses.
Also I think the new galaxy cache is very suitable for this usecase.

Describe alternatives you've considered

Have some kind of gRPC proxy to do caching/passthrough based on the requests. I don't do any investigation right now but maybe something already suits my usecase.

The text was updated successfully, but these errors were encountered:

GiedriusS · 2022-01-09T17:46:50Z

So, something like galaxycache but for gRPC calls? Did I understand you correctly?

yeya24 · 2022-01-09T18:10:08Z

So, something like galaxycache but for gRPC calls? Did I understand you correctly?

Yes

GiedriusS · 2022-01-13T13:40:46Z

I agree, this would be great. Perhaps this could be a LFX project? In the mean time I have been using a local version of this functionality: 310df0c. It already has deduplicated thousands of Series() calls on my deployment. Perhaps we could merge this local version first and then work on the groupcache-esque one?

stale · 2022-04-16T03:54:51Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2022-09-21T06:29:36Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

GiedriusS added component: query difficulty: hard feature request/improvement GSoC/Community Bridge/LFX labels Jan 13, 2022

stale bot added the stale label Apr 16, 2022

GiedriusS removed the stale label Apr 17, 2022

stale bot added the stale label Sep 21, 2022

GiedriusS added dont-go-stale Label for important issues which tells the stalebot not to close them and removed stale labels Sep 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shared caching layer for thanos queriers #5047

Shared caching layer for thanos queriers #5047

yeya24 commented Jan 9, 2022

GiedriusS commented Jan 9, 2022

yeya24 commented Jan 9, 2022

GiedriusS commented Jan 13, 2022

stale bot commented Apr 16, 2022

stale bot commented Sep 21, 2022

Shared caching layer for thanos queriers #5047

Shared caching layer for thanos queriers #5047

Comments

yeya24 commented Jan 9, 2022

Is your proposal related to a problem?

Info requests to sidecars

metadata and rules query requests to sidecars

more use case in the future

Describe the solution you'd like

Describe alternatives you've considered

GiedriusS commented Jan 9, 2022

yeya24 commented Jan 9, 2022

GiedriusS commented Jan 13, 2022

stale bot commented Apr 16, 2022

stale bot commented Sep 21, 2022