Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared caching layer for thanos queriers #5047

Open
yeya24 opened this issue Jan 9, 2022 · 5 comments
Open

Shared caching layer for thanos queriers #5047

yeya24 opened this issue Jan 9, 2022 · 5 comments
Labels

Comments

@yeya24
Copy link
Contributor

yeya24 commented Jan 9, 2022

Is your proposal related to a problem?

We are running about 10 Thanos querier replicas for scaling purposes and we have 100+ sidecar + prometheus edge clusters across the world.

For our setup, the fanout problem is huge because of the scale. For example:

Info requests to sidecars

This is not a big problem because Info Request and Response are relatively cheap. In our setup, (number of queriers x number of sidecars) requests are sent every time. It is okay when scale is small. However, when you have more and more Thanos Queriers and edge sidecars, this is not very efficient.

metadata and rules query requests to sidecars

metrics metadata and rules query is something hardly changed for us. Especially metrics metadata. This is where caching would benefit us a lot.

more use case in the future

From #1611, we proposed to have some bloom filter like datastructure for reducing unnecessary series calls. Ideally, this could be done by introducing more data reported from the Info API and keep a bloom filter in queriers. If we can have a caching layer for the querier clusters then keeping the bloom filter up-to-date is not that expensive anymore.

Describe the solution you'd like

Have another type of cache for this use case. Maybe call it proxy cache? It is similar to caching bucket but this time we cache endpoint responses.
Also I think the new galaxy cache is very suitable for this usecase.

Describe alternatives you've considered

Have some kind of gRPC proxy to do caching/passthrough based on the requests. I don't do any investigation right now but maybe something already suits my usecase.

@GiedriusS
Copy link
Member

So, something like galaxycache but for gRPC calls? Did I understand you correctly?

@yeya24
Copy link
Contributor Author

yeya24 commented Jan 9, 2022

So, something like galaxycache but for gRPC calls? Did I understand you correctly?

Yes

@GiedriusS
Copy link
Member

I agree, this would be great. Perhaps this could be a LFX project? In the mean time I have been using a local version of this functionality: 310df0c. It already has deduplicated thousands of Series() calls on my deployment. Perhaps we could merge this local version first and then work on the groupcache-esque one?

@stale
Copy link

stale bot commented Apr 16, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Apr 16, 2022
@GiedriusS GiedriusS removed the stale label Apr 17, 2022
@stale
Copy link

stale bot commented Sep 21, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Sep 21, 2022
@GiedriusS GiedriusS added dont-go-stale Label for important issues which tells the stalebot not to close them and removed stale labels Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants