QueryFrontend: HTTP/gRPC client load balancing #3373

bwplotka · 2020-10-28T09:22:15Z

It would be amazing to use spitting mechanism better and allow distributing requests to multiple querier replicas.

Most of us use Kubernetes with K8s Serice only which are TCP based, so they don't do round robin balancing - they just pick one. Some of users don't have good lb handy even. I think it should be application logic to make that happen.

I already wrote similar code e.g in Kedge which works on production even now (https://github.com/improbable-eng/kedge/blob/772f9b2d2092a0ada972096945bee8cd49513da6/pkg/kedge/http/lbtransport/transport.go#L104), HOWEVER, it might be a better idea to use grpc rather. This is more tricky though but gives us support for load balancing on Querier towards other APIs as well. So we have two choices (or implement both):

a) http

Pros:

We have code for it ready
Query endpoints are in HTTP already.

b) grpc

Pros:

LB for query frontend against just Prometheus is not needed
We need these features for Querier -> Store/Rule/TargerAPIs as well anyway.
We can implement richer metadata passing that will allow to loadbalance based on saturation(!)

Cons:

There is some sketch of gRPC client LB code in gRPC ecosystem, but this is changing every release, so work has to be done.
We need to expose Query APIs with gRPC and switch to gRPC port.

In both cases acceptance criteria are the same:

AC:

Query frontend replicas can route to multiple queries and have different load balancing strategies (Round Robin for a start)

I would vote for B (: More ambitious and more benefits. Thoughts? @pracucci @yeya24 @brancz @kakkoyun

kakkoyun · 2020-10-28T09:38:01Z

I think we should immediately go for option a just because it's a low hanging fruit. And then we can devise a plan to implement option b.
I'm with you the ideal solution would be the option b.

So let's just make it work and then we can iterate and optimize it.

brancz · 2020-10-31T08:16:19Z

I agree with @kakkoyun.

stale · 2020-12-31T01:11:45Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

hitanshu-mehta · 2021-02-18T05:14:33Z

I think I can work on this issue with some guidance. Can you please assign me if nobody is working on this?

hitanshu-mehta · 2021-02-24T13:41:47Z

I have one question regarding first part of AC, i.e.

Query frontend replicas can route to multiple queries

What approach everyone suggests? Should it be similar to cortex? ( which is as far as i understood 😅 , add a flag in querier -querier.frontend-address to connect it with frontend and querier will pull request from the frontend queue. )

roidelapluie · 2021-02-24T13:44:05Z

if you go for http, would it be something reusable for prometheus/prometheus#8402 ?

bwplotka · 2021-02-24T18:38:47Z

@roidelapluie Yes 🤗

stale · 2021-06-03T02:15:11Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

kakkoyun · 2021-06-03T09:53:48Z

Still valid.

stale · 2021-08-02T18:44:40Z

Hello 👋 Looks like there was no activity on this issue for the last two months.
Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗
If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale · 2021-08-17T02:43:18Z

Closing for now as promised, let us know if you need this to be reopened! 🤗

hdost · 2021-08-25T20:35:19Z

Still valid.

bwplotka added difficulty: medium feature request/improvement help wanted labels Oct 28, 2020

bwplotka mentioned this issue Oct 28, 2020

Add query frontend to the quickstart script #3372

Closed

2 tasks

bwplotka mentioned this issue Nov 9, 2020

querier: Add option for different gRPC client Loadbalancing options for StoreAPI #1083

Open

stale bot added the stale label Dec 31, 2020

kakkoyun removed the stale label Jan 4, 2021

stale bot added the stale label Jun 3, 2021

stale bot removed the stale label Jun 3, 2021

stale bot added the stale label Aug 2, 2021

stale bot closed this as completed Aug 17, 2021

bwplotka reopened this Jun 16, 2022

stale bot removed the stale label Jun 16, 2022

bwplotka added the dont-go-stale Label for important issues which tells the stalebot not to close them label Jun 16, 2022

bwplotka mentioned this issue Jun 16, 2022

Add query frontend component to the quickstart script #3304

Open

kc611 mentioned this issue Aug 19, 2022

query-frontend: Client load balancing #3016

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QueryFrontend: HTTP/gRPC client load balancing #3373

QueryFrontend: HTTP/gRPC client load balancing #3373

bwplotka commented Oct 28, 2020 •

edited by kakkoyun

Loading

kakkoyun commented Oct 28, 2020

brancz commented Oct 31, 2020

stale bot commented Dec 31, 2020

hitanshu-mehta commented Feb 18, 2021

hitanshu-mehta commented Feb 24, 2021

roidelapluie commented Feb 24, 2021

bwplotka commented Feb 24, 2021 •

edited

Loading

stale bot commented Jun 3, 2021

kakkoyun commented Jun 3, 2021

stale bot commented Aug 2, 2021

stale bot commented Aug 17, 2021

hdost commented Aug 25, 2021

QueryFrontend: HTTP/gRPC client load balancing #3373

QueryFrontend: HTTP/gRPC client load balancing #3373

Comments

bwplotka commented Oct 28, 2020 • edited by kakkoyun Loading

kakkoyun commented Oct 28, 2020

brancz commented Oct 31, 2020

stale bot commented Dec 31, 2020

hitanshu-mehta commented Feb 18, 2021

hitanshu-mehta commented Feb 24, 2021

roidelapluie commented Feb 24, 2021

bwplotka commented Feb 24, 2021 • edited Loading

stale bot commented Jun 3, 2021

kakkoyun commented Jun 3, 2021

stale bot commented Aug 2, 2021

stale bot commented Aug 17, 2021

hdost commented Aug 25, 2021

bwplotka commented Oct 28, 2020 •

edited by kakkoyun

Loading

bwplotka commented Feb 24, 2021 •

edited

Loading