-
Notifications
You must be signed in to change notification settings - Fork 820
Frontend scaling #3374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frontend scaling #3374
Conversation
Marking as ready for review, as I don't expect big changes at this point, even though I still plan to add more tests to new frontend2 code. |
I've made this diagram to help in the review. Note that part of this PR was extracting Main flow from the diagram is in frontend2 package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this, also the diagram helps tie it together more easily, so thanks for that as well.
// | ||
// Long-running loop is used to detect broken connection between frontend and scheduler. This is important for both | ||
// parties... if connection breaks, frontend can cancel (and possibly retry on different scheduler) all pending | ||
// requests sent to this scheduler, while scheduler can cancel queued requests from given frontend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what conditions is scheduler expected to cancel requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When last connection from frontend disconnects (ie. last FrontendLoop call for given frontend-address exits), scheduler will cancel all requests from given frontend. It does so by cancelling frontend-specific context.
Apart from that, the only other way how request from frontend is cancelled is when frontend itself sends cancellation request to scheduler.
pkg/querier/frontend2/frontend.proto
Outdated
// Used by INIT message. Will be put into all requests passed to querier. | ||
string frontendAddress = 2; | ||
|
||
// Used by ENQUEUE and CANCEL. Each enqueued query must have queryID higher than previous one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is queryID's unique per frontend, or frontend/scheduler combination, or something else? Can you document expectations and requirements for where this is unique or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queryID is unique per frontend.
Different frontends can use same queryIDs for different queries.
(Since each frontend starts with a random uint64 number this is unlikely, but reason for doing randomization is to avoid conflicts when frontend restarts -- it could receive response from querier for wrong query).
[]string{"/cortex.Ingester/TransferChunks", "/frontend.Frontend/Process"}) | ||
// Also don't check auth for these gRPC methods, since single call is used for multiple users (or no user like health check). | ||
[]string{ | ||
"/grpc.health.v1.Health/Check", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like "/grpc.health.v1.Health/Check", was just added to this. How was this working before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Client pool uses health check to remote stale clients from the pool. When issuing Check method, it sets user "0", to avoid authentication errors. (
cortex/pkg/ring/client/pool.go
Line 193 in 4b40f20
ctx = user.InjectOrgID(ctx, "0") |
I've tried to remove that setting of "0" and add this exception instead. It worked fine between querier and frontend, but not between querier and ingester, simply because I haven't deployed updated ingester during my testing. I have then realized that it would be bit painful to migrate if I had removed that setting of "0" user, so first step in migration is just adding this exception.
That said, we can remove it from this PR as it's currently not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove it from this PR, if unused. I would also update the comment removing (or no user like health check)
.
|
||
t.API.RegisterQueryFrontendHandler(handler) | ||
|
||
if frontendV1 != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is possible to have v1 and v2 frontends both running with same queriers?
I'm just thinking about migration strategies for a cluster going to v2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not possible.
Both query-frontend and querier must use scheduler or neither should.
Migration from setup with query-frontend and querier but no scheduler is possible by:
- starting scheduler (or two)
- starting query-frontend and querier configured with scheduler
- redirecting queries from old query-frontend (without scheduler) to new query-frontend
- once no traffic is coming to old query-frontends, they can be stopped, together with old queriers.
It would be possible with some extra effort to provide migration path at query-frontend level, but this PR doesn't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so basically the migration path would be spin up new queriers + new query frontends + scheduler, then switch traffic from old query frontends and old queriers, then turn down old queriers and old query frontends. That seems like it will work, just requires running more components during transition.
pkg/querier/frontend/handler.go
Outdated
} | ||
} | ||
|
||
//// Handler creates handler for HTTP requests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for retaining this code in the comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, just oversight :) Thanks.
pkg/querier/frontend/handler.go
Outdated
} | ||
|
||
level.Info(util.WithContext(r.Context(), f.log)).Log(logMessage...) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: extra newline
} | ||
|
||
func (f *frontendSchedulerWorkers) connectToScheduler(ctx context.Context, address string) (*grpc.ClientConn, error) { | ||
// Because we only use single long-running method, it doesn't make sense to inect user ID, send over tracing or add metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: inect -> inject
} | ||
|
||
func (f *querierSchedulerWorkers) connectToScheduler(ctx context.Context, address string) (*grpc.ClientConn, error) { | ||
// Because we only use single long-running method, it doesn't make sense to inect user ID, send over tracing or add metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"inect" instead of "inject" again
|
||
c, err := w.frontendPool.GetClientFor(frontendAddress) | ||
if err == nil { | ||
// Response is empty and uninteresting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
empty? Doesn't it get here whenever response is < w.maxMessageSize and handler didn't return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That comment refers to next line. Response from calling QueryResult
on the frontend is empty structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is "happy path", we get here if we have something to report to frontend (can be query response or error response).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, thanks.
|
||
// Ensure the request has not already expired. | ||
if request.ctx.Err() == nil { | ||
return request, lastUserIndex, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it already deleted from s.pendingRequests in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at this point. Deletion happens later in forwardRequestToQuerier
via defered cancelRequest
call.
…onent. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, modulo the things you have called out as remaining TODOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow @pstibrany, such an huge work! A bit too big for my taste. Next time we should try to work through progressive PRs. It's too difficult to follow the changes and there's an high chance of let bugs slip in when such huge PRs come in.
That being said, solid work 👏
I left a lot of minor comments and some questions. A part from this, would be nice seeing an integration test, at least to test the happy path.
Ideas to consider for future PRs:
- The query-frontend is a dedicated service in our architecture. We could make it a first class citizen in our package, moving it under
pkg/frontend
. If we'll need to keep v1/v2 code distinction, we could move version-specific code underpkg/frontend/v1
andpkg/frontend/v2
respectively, while keeping common code inpkg/frontend
.
case err := <-errCh: | ||
// Is there was an error handling this request due to network IO, | ||
// then error out this upstream request _and_ stream. | ||
// TODO: if err is not nil, scheduler should notify frontend using the frontend address. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's about this TODO?
💯 agree. And with more time it would be possible to split this PR into multiple smaller ones, but given time constraints I opted for one big PR to get the work done asap. Thank you for taking a time to review it!
👍
There is some repeated code between v1/v2, it might be possible to refactor it out and keep single frontend package. I plan to give it a try in follow-up PRs. |
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
… to also send shutting down error to frontend immediately. Frontend worker expects OK, and exits FrontendLoop otherwise. Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pracucci Thanks a lot for your comprehensive review! I've addressed most of your feedback. In some cases GitHub UI doesn't allow me to add replies, so I've commented via Slack (feel free to share here if GH isn't misbehaving). I can address remaining feedback (integration test, grpc keep-alive) early next week, or feel free to take over if urgent. Thanks! |
Signed-off-by: Marco Pracucci <marco@pracucci.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @pstibrany for promptly addressing feedback! I've noted down (and shared to you) the list of remaining cleanup / enhancements / doc to do, so we can split the work and get this work "done done".
What this PR does:
This PR implements scaleable query-frontend by extracting queue (called "scheduler") into separate component. Implementation doesn't change existing query-frontend, but introduces completely new one (copying the old one). Followup PR will refactor both implementations to reuse shared code.
Current thinking is that there will be no migration path from old to new model, but old model (no scheduler) will be available and used in single-binary mode, or with configuration without scheduler. Scheduler will not be part of single-binary, but can run separately and frontend and queriers in single-binary mode can then be configured to use it.
When using query-frontend V2, query-scheduler and new query workers, flow of the request is as follows:
Query-scheduler exposes two gRPC interfaces:
SchedulerForFrontend
andSchedulerForQuerier
. Both have single "loop" method which follows the same pattern as original frontend does – exchanging requests/responses in lockstep.When scheduler finds that frontend has completely disconnected (which it can do thanks to "loop" method), it will cancel all requests from given frontend. Frontend can also cancel request that is already enqueued in scheduler (or even running in the querier). It does so by sending cancellation request to scheduler. Cancelling request that is already running on querier means closing the "loop" method (by scheduler) for given querier.
Both frontend and querier-worker use DNS watcher to monitor what schedulers are available, and connect to all of them. Thanks to using schedulers as the middleman, we can now scale up query-frontends much more easily. Number of schedulers should be low (eg. two, for redudancy).
This PR supercedes #3255 (while taking parts of it)
This PR also splits old
Frontend
intoHandler
(HTTP handler that uses generic RoundTripper), andGrpcRoundTripper
+ adapter (both old and new Frontends implementGrpcRoundTripper
interface). V1 and V2 frontend selection is based on whether scheduler address is configured.Which issue(s) this PR fixes:
Fixes #1150
Notes to reviewer: I'd start with
frontend2/frontend.proto
to make sure it makes sense.TODO:
frontend_querier_queues.go
is in frontend2 just a copy with different request type. If we useinterface{}
, we don't need two copies.Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]