query-frontend: Client load balancing #3016

bwplotka · 2020-08-11T13:50:29Z

It would be nice to enable this so we can distribute the load among queries.

Alternative is to move to subscription API Cortex use.

pracucci · 2020-08-11T13:53:18Z

Alternative is to move to subscription API Cortex use.

Not pushing you in any direction (there are pros and cons). What we observed in Cortex is that round robin queries across a pool of queriers doesn't end up with a fair split of the workload. The problem is that resources a single query takes (CPU and memory) are very different from query to query and, when running in a heavily utilised cluster, you will end up with some idle queries and other busy ones.

bwplotka · 2020-08-11T17:14:41Z

We could use this for client side. https://github.com/improbable-eng/kedge/blob/master/pkg/kedge/http/lbtransport/transport.go#L52 (:

bwplotka · 2020-08-11T17:15:42Z

The problem is that resources a single query takes (CPU and memory) are very different from query to query and, when running in a heavily utilised cluster, you will end up with some idle queries and other busy ones.

That makes sense, but it's better than utilizing one. And looks like both client loadbalancing and subscription based will have similar problem

pracucci · 2020-08-12T05:07:40Z

Could you elaborate on what similar problem they have, please?

…

On Tue, Aug 11, 2020, 19:15 Bartlomiej Plotka ***@***.***> wrote: The problem is that resources a single query takes (CPU and memory) are very different from query to query and, when running in a heavily utilised cluster, you will end up with some idle queries and other busy ones. That makes sense, but it's better than utilizing one. And looks like both client loadbalancing and subscription based will have similar problem — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#3016 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAM7QECQKFIWR3KXFG6BC6LSAF4E3ANCNFSM4P3B4URQ> .

yeya24 · 2020-08-20T14:08:22Z

One problem I see is that:

if we have multiple Queriers, do we need to require that these queriers configure the same stores?

If they have different Store configurations, then we will get different results when querying them. In this case we might need another cache key.

stale · 2020-11-23T19:19:18Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

bwplotka · 2020-11-23T19:56:21Z

Very needed. Kind Regards, Bartek Płotka (@bwplotka)

…

On Mon, 23 Nov 2020 at 20:19, stale[bot] ***@***.***> wrote: Hello 👋 Looks like there was no activity on this issue for the last two months. *Do you mind updating us on the status?* Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command <https://probot.github.io/apps/reminders/> if you wish to be reminded at some point in future. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3016 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVA3O3CJP3MXCN7IQOEFRDSRKYULANCNFSM4P3B4URQ> .

stale · 2021-01-23T12:07:41Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

yeya24 · 2021-01-23T14:30:25Z

Still needed.

clyang82 · 2021-01-26T13:00:09Z

I met this problem in my environment. one of query is heavily utilised while another one is almost idle.

observability-observatorium-thanos-query-746bf9b6cb-66dgf         0m           20Mi
observability-observatorium-thanos-query-746bf9b6cb-bfc95         1451m        6300Mi

very needed this feature. Thanks.

heyitsmdr · 2021-02-19T16:34:12Z

We're looking for this feature as well. For now, we are relying on a kube load balancer.

stale · 2021-06-03T02:15:29Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

clyang82 · 2021-06-11T06:30:32Z

any plan for it?

stale · 2021-08-13T23:29:02Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-08-28T15:56:19Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

skant7 · 2022-08-14T10:49:52Z

Hi @bwplotka I'm interested in working on this as a part of the LFX mentorship program is there any related PR or specific part of the source code or anything which I can look up just to get started?

kc611 · 2022-08-19T06:52:24Z

Hi everyone, I'm interested in this as part of LFX. (a bit too late 🙂 I guess ?)

Reviving this issue, I think a continuation of this particular issue was discussed in: #3373
and a proposal was put also put in regarding this: https://thanos.io/tip/proposals-done/202004-embedd-cortex-frontend.md/

And the current state of this project is:

We are not using Cortex frontend due to potential config issues for both users and Thanos itself.
We are implementing our own client load balancing in query-frontend and it would be either a HTTP or a gRPC based API (with the general consensus being towards implementing a gRPC based one, mostly because it being more useful in a longer term).

So my question here would be:

Is that still the case (and we are going with implementation of the gRPC based API) ? Or are we going for the low-hanging fruit, the HTTP based one.
If we decide to go with the gRPC based one, should we try exposing the Query APIs with gRPC first. (before putting in a load-balancer implementation), or is that even possible to do this without affecting the current functionality ? (Not exactly familiar with the Querier)

stale · 2022-11-13T15:14:08Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

bwplotka added feature request/improvement help wanted labels Aug 11, 2020

yeya24 added the component: query-frontend label Aug 18, 2020

bwplotka added the Hacktoberfest label Sep 24, 2020

stale bot added the stale label Nov 23, 2020

stale bot removed the stale label Nov 23, 2020

stale bot added the stale label Jan 23, 2021

stale bot removed the stale label Jan 23, 2021

kakkoyun added the difficulty: medium label Feb 10, 2021

stale bot added the stale label Jun 3, 2021

stale bot removed the stale label Jun 11, 2021

stale bot added the stale label Aug 13, 2021

stale bot closed this as completed Aug 28, 2021

bwplotka reopened this Aug 12, 2022

stale bot removed the stale label Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

query-frontend: Client load balancing #3016

query-frontend: Client load balancing #3016

bwplotka commented Aug 11, 2020

pracucci commented Aug 11, 2020

bwplotka commented Aug 11, 2020

bwplotka commented Aug 11, 2020

pracucci commented Aug 12, 2020 via email

yeya24 commented Aug 20, 2020

stale bot commented Nov 23, 2020

bwplotka commented Nov 23, 2020 via email

stale bot commented Jan 23, 2021

yeya24 commented Jan 23, 2021

clyang82 commented Jan 26, 2021 •

edited

Loading

heyitsmdr commented Feb 19, 2021

stale bot commented Jun 3, 2021

clyang82 commented Jun 11, 2021

stale bot commented Aug 13, 2021

stale bot commented Aug 28, 2021

skant7 commented Aug 14, 2022 •

edited

Loading

kc611 commented Aug 19, 2022

stale bot commented Nov 13, 2022

query-frontend: Client load balancing #3016

query-frontend: Client load balancing #3016

Comments

bwplotka commented Aug 11, 2020

pracucci commented Aug 11, 2020

bwplotka commented Aug 11, 2020

bwplotka commented Aug 11, 2020

pracucci commented Aug 12, 2020 via email

yeya24 commented Aug 20, 2020

stale bot commented Nov 23, 2020

bwplotka commented Nov 23, 2020 via email

stale bot commented Jan 23, 2021

yeya24 commented Jan 23, 2021

clyang82 commented Jan 26, 2021 • edited Loading

heyitsmdr commented Feb 19, 2021

stale bot commented Jun 3, 2021

clyang82 commented Jun 11, 2021

stale bot commented Aug 13, 2021

stale bot commented Aug 28, 2021

skant7 commented Aug 14, 2022 • edited Loading

kc611 commented Aug 19, 2022

stale bot commented Nov 13, 2022

clyang82 commented Jan 26, 2021 •

edited

Loading

skant7 commented Aug 14, 2022 •

edited

Loading