Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query-frontend: Client load balancing #3016

Open
bwplotka opened this issue Aug 11, 2020 · 18 comments
Open

query-frontend: Client load balancing #3016

bwplotka opened this issue Aug 11, 2020 · 18 comments

Comments

@bwplotka
Copy link
Member

It would be nice to enable this so we can distribute the load among queries.

Alternative is to move to subscription API Cortex use.

@pracucci
Copy link
Contributor

Alternative is to move to subscription API Cortex use.

Not pushing you in any direction (there are pros and cons). What we observed in Cortex is that round robin queries across a pool of queriers doesn't end up with a fair split of the workload. The problem is that resources a single query takes (CPU and memory) are very different from query to query and, when running in a heavily utilised cluster, you will end up with some idle queries and other busy ones.

@bwplotka
Copy link
Member Author

@bwplotka
Copy link
Member Author

The problem is that resources a single query takes (CPU and memory) are very different from query to query and, when running in a heavily utilised cluster, you will end up with some idle queries and other busy ones.

That makes sense, but it's better than utilizing one. And looks like both client loadbalancing and subscription based will have similar problem

@pracucci
Copy link
Contributor

pracucci commented Aug 12, 2020 via email

@yeya24
Copy link
Contributor

yeya24 commented Aug 20, 2020

One problem I see is that:

if we have multiple Queriers, do we need to require that these queriers configure the same stores?

If they have different Store configurations, then we will get different results when querying them. In this case we might need another cache key.

@stale
Copy link

stale bot commented Nov 23, 2020

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Nov 23, 2020
@bwplotka
Copy link
Member Author

bwplotka commented Nov 23, 2020 via email

@stale stale bot removed the stale label Nov 23, 2020
@stale
Copy link

stale bot commented Jan 23, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jan 23, 2021
@yeya24
Copy link
Contributor

yeya24 commented Jan 23, 2021

Still needed.

@stale stale bot removed the stale label Jan 23, 2021
@clyang82
Copy link
Contributor

clyang82 commented Jan 26, 2021

I met this problem in my environment. one of query is heavily utilised while another one is almost idle.

observability-observatorium-thanos-query-746bf9b6cb-66dgf         0m           20Mi
observability-observatorium-thanos-query-746bf9b6cb-bfc95         1451m        6300Mi

very needed this feature. Thanks.

@heyitsmdr
Copy link

We're looking for this feature as well. For now, we are relying on a kube load balancer.

@stale
Copy link

stale bot commented Jun 3, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Jun 3, 2021
@clyang82
Copy link
Contributor

any plan for it?

@stale stale bot removed the stale label Jun 11, 2021
@stale
Copy link

stale bot commented Aug 13, 2021

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

@stale stale bot added the stale label Aug 13, 2021
@stale
Copy link

stale bot commented Aug 28, 2021

Closing for now as promised, let us know if you need this to be reopened! 🤗

@stale stale bot closed this as completed Aug 28, 2021
@bwplotka bwplotka reopened this Aug 12, 2022
@stale stale bot removed the stale label Aug 12, 2022
@skant7
Copy link

skant7 commented Aug 14, 2022

Hi @bwplotka I'm interested in working on this as a part of the LFX mentorship program is there any related PR or specific part of the source code or anything which I can look up just to get started?

@kc611
Copy link

kc611 commented Aug 19, 2022

Hi everyone, I'm interested in this as part of LFX. (a bit too late 🙂 I guess ?)

Reviving this issue, I think a continuation of this particular issue was discussed in: #3373
and a proposal was put also put in regarding this: https://thanos.io/tip/proposals-done/202004-embedd-cortex-frontend.md/

And the current state of this project is:

  • We are not using Cortex frontend due to potential config issues for both users and Thanos itself.
  • We are implementing our own client load balancing in query-frontend and it would be either a HTTP or a gRPC based API (with the general consensus being towards implementing a gRPC based one, mostly because it being more useful in a longer term).

So my question here would be:

  • Is that still the case (and we are going with implementation of the gRPC based API) ? Or are we going for the low-hanging fruit, the HTTP based one.
  • If we decide to go with the gRPC based one, should we try exposing the Query APIs with gRPC first. (before putting in a load-balancer implementation), or is that even possible to do this without affecting the current functionality ? (Not exactly familiar with the Querier)

@stale
Copy link

stale bot commented Nov 13, 2022

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants