Skip to content

Frontend scaling #3374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 30, 2020
Merged

Frontend scaling #3374

merged 9 commits into from
Oct 30, 2020

Conversation

pstibrany
Copy link
Contributor

@pstibrany pstibrany commented Oct 21, 2020

What this PR does:
This PR implements scaleable query-frontend by extracting queue (called "scheduler") into separate component. Implementation doesn't change existing query-frontend, but introduces completely new one (copying the old one). Followup PR will refactor both implementations to reuse shared code.

Current thinking is that there will be no migration path from old to new model, but old model (no scheduler) will be available and used in single-binary mode, or with configuration without scheduler. Scheduler will not be part of single-binary, but can run separately and frontend and queriers in single-binary mode can then be configured to use it.

When using query-frontend V2, query-scheduler and new query workers, flow of the request is as follows:

  1. Query arrives to query-frontend.
  2. Query is forwarded to random query-sheduler.
  3. Random querier will pick the query from query-sheduler.
  4. When querier is finished with processing of the query, it sends response back directly query-frontend, avoiding the scheduler.

Query-scheduler exposes two gRPC interfaces: SchedulerForFrontend and SchedulerForQuerier. Both have single "loop" method which follows the same pattern as original frontend does – exchanging requests/responses in lockstep.

When scheduler finds that frontend has completely disconnected (which it can do thanks to "loop" method), it will cancel all requests from given frontend. Frontend can also cancel request that is already enqueued in scheduler (or even running in the querier). It does so by sending cancellation request to scheduler. Cancelling request that is already running on querier means closing the "loop" method (by scheduler) for given querier.

Both frontend and querier-worker use DNS watcher to monitor what schedulers are available, and connect to all of them. Thanks to using schedulers as the middleman, we can now scale up query-frontends much more easily. Number of schedulers should be low (eg. two, for redudancy).

This PR supercedes #3255 (while taking parts of it)

This PR also splits old Frontend into Handler (HTTP handler that uses generic RoundTripper), and GrpcRoundTripper + adapter (both old and new Frontends implement GrpcRoundTripper interface). V1 and V2 frontend selection is based on whether scheduler address is configured.

Which issue(s) this PR fixes:
Fixes #1150

Notes to reviewer: I'd start with frontend2/frontend.proto to make sure it makes sense.

TODO:

  • More tests
  • Documentation
  • Refactor frontend and frontend2 package to reuse common code (follow-up PR):
    • move dns_watcher.go from frontend2 to frontend, and refactor querier worker to use it
    • querier workers are quite similar, but in v1 querier gets queries from frontend and sends response back to frontend, while in v2 querier gets queries from scheduler and sends response directly to frontend (not scheduler). Ideally this should be refactored to reuse common code.
    • frontend_querier_queues.go is in frontend2 just a copy with different request type. If we use interface{}, we don't need two copies.

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@pstibrany pstibrany marked this pull request as draft October 21, 2020 14:18
@pstibrany pstibrany marked this pull request as ready for review October 23, 2020 13:52
@pstibrany
Copy link
Contributor Author

Marking as ready for review, as I don't expect big changes at this point, even though I still plan to add more tests to new frontend2 code.

@pstibrany pstibrany requested review from jtlisi and pracucci October 27, 2020 12:45
@pstibrany
Copy link
Contributor Author

I've made this diagram to help in the review.

frontend-scaling-diagram

Note that part of this PR was extracting frontend.grpcRoundTripperAdapter and frontend.Handler from existing frontend.Frontend, to make it reusable. Most changes in original frontend in this PR are related to this, and keeping configuration compatibility.

Main flow from the diagram is in frontend2 package.

Copy link
Contributor

@ranton256 ranton256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, also the diagram helps tie it together more easily, so thanks for that as well.

//
// Long-running loop is used to detect broken connection between frontend and scheduler. This is important for both
// parties... if connection breaks, frontend can cancel (and possibly retry on different scheduler) all pending
// requests sent to this scheduler, while scheduler can cancel queued requests from given frontend.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what conditions is scheduler expected to cancel requests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When last connection from frontend disconnects (ie. last FrontendLoop call for given frontend-address exits), scheduler will cancel all requests from given frontend. It does so by cancelling frontend-specific context.

Apart from that, the only other way how request from frontend is cancelled is when frontend itself sends cancellation request to scheduler.

// Used by INIT message. Will be put into all requests passed to querier.
string frontendAddress = 2;

// Used by ENQUEUE and CANCEL. Each enqueued query must have queryID higher than previous one.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is queryID's unique per frontend, or frontend/scheduler combination, or something else? Can you document expectations and requirements for where this is unique or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queryID is unique per frontend.

Different frontends can use same queryIDs for different queries.

(Since each frontend starts with a random uint64 number this is unlikely, but reason for doing randomization is to avoid conflicts when frontend restarts -- it could receive response from querier for wrong query).

[]string{"/cortex.Ingester/TransferChunks", "/frontend.Frontend/Process"})
// Also don't check auth for these gRPC methods, since single call is used for multiple users (or no user like health check).
[]string{
"/grpc.health.v1.Health/Check",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like "/grpc.health.v1.Health/Check", was just added to this. How was this working before?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client pool uses health check to remote stale clients from the pool. When issuing Check method, it sets user "0", to avoid authentication errors. (

ctx = user.InjectOrgID(ctx, "0")
)

I've tried to remove that setting of "0" and add this exception instead. It worked fine between querier and frontend, but not between querier and ingester, simply because I haven't deployed updated ingester during my testing. I have then realized that it would be bit painful to migrate if I had removed that setting of "0" user, so first step in migration is just adding this exception.

That said, we can remove it from this PR as it's currently not needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove it from this PR, if unused. I would also update the comment removing (or no user like health check).


t.API.RegisterQueryFrontendHandler(handler)

if frontendV1 != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is possible to have v1 and v2 frontends both running with same queriers?
I'm just thinking about migration strategies for a cluster going to v2.

Copy link
Contributor Author

@pstibrany pstibrany Oct 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's not possible.

Both query-frontend and querier must use scheduler or neither should.

Migration from setup with query-frontend and querier but no scheduler is possible by:

  1. starting scheduler (or two)
  2. starting query-frontend and querier configured with scheduler
  3. redirecting queries from old query-frontend (without scheduler) to new query-frontend
  4. once no traffic is coming to old query-frontends, they can be stopped, together with old queriers.

It would be possible with some extra effort to provide migration path at query-frontend level, but this PR doesn't do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so basically the migration path would be spin up new queriers + new query frontends + scheduler, then switch traffic from old query frontends and old queriers, then turn down old queriers and old query frontends. That seems like it will work, just requires running more components during transition.

}
}

//// Handler creates handler for HTTP requests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for retaining this code in the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just oversight :) Thanks.

}

level.Info(util.WithContext(r.Context(), f.log)).Log(logMessage...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extra newline

}

func (f *frontendSchedulerWorkers) connectToScheduler(ctx context.Context, address string) (*grpc.ClientConn, error) {
// Because we only use single long-running method, it doesn't make sense to inect user ID, send over tracing or add metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: inect -> inject

}

func (f *querierSchedulerWorkers) connectToScheduler(ctx context.Context, address string) (*grpc.ClientConn, error) {
// Because we only use single long-running method, it doesn't make sense to inect user ID, send over tracing or add metrics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"inect" instead of "inject" again


c, err := w.frontendPool.GetClientFor(frontendAddress)
if err == nil {
// Response is empty and uninteresting.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty? Doesn't it get here whenever response is < w.maxMessageSize and handler didn't return an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That comment refers to next line. Response from calling QueryResult on the frontend is empty structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "happy path", we get here if we have something to report to frontend (can be query response or error response).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, thanks.


// Ensure the request has not already expired.
if request.ctx.Err() == nil {
return request, lastUserIndex, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it already deleted from s.pendingRequests in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this point. Deletion happens later in forwardRequestToQuerier via defered cancelRequest call.

…onent.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Copy link
Contributor

@ranton256 ranton256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo the things you have called out as remaining TODOs.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow @pstibrany, such an huge work! A bit too big for my taste. Next time we should try to work through progressive PRs. It's too difficult to follow the changes and there's an high chance of let bugs slip in when such huge PRs come in.

That being said, solid work 👏

I left a lot of minor comments and some questions. A part from this, would be nice seeing an integration test, at least to test the happy path.

Ideas to consider for future PRs:

  • The query-frontend is a dedicated service in our architecture. We could make it a first class citizen in our package, moving it under pkg/frontend. If we'll need to keep v1/v2 code distinction, we could move version-specific code under pkg/frontend/v1 and pkg/frontend/v2 respectively, while keeping common code in pkg/frontend.

case err := <-errCh:
// Is there was an error handling this request due to network IO,
// then error out this upstream request _and_ stream.
// TODO: if err is not nil, scheduler should notify frontend using the frontend address.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's about this TODO?

@pstibrany
Copy link
Contributor Author

pstibrany commented Oct 29, 2020

Wow @pstibrany, such an huge work! A bit too big for my taste. Next time we should try to work through progressive PRs. It's too difficult to follow the changes and there's an high chance of let bugs slip in when such huge PRs come in.

💯 agree. And with more time it would be possible to split this PR into multiple smaller ones, but given time constraints I opted for one big PR to get the work done asap.

Thank you for taking a time to review it!

I left a lot of minor comments and some questions. A part from this, would be nice seeing an integration test, at least to test the happy path.

👍

  • The query-frontend is a dedicated service in our architecture. We could make it a first class citizen in our package, moving it under pkg/frontend. If we'll need to keep v1/v2 code distinction, we could move version-specific code under pkg/frontend/v1 and pkg/frontend/v2 respectively, while keeping common code in pkg/frontend.

There is some repeated code between v1/v2, it might be possible to refactor it out and keep single frontend package. I plan to give it a try in follow-up PRs.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
… to also send shutting down error to frontend immediately. Frontend worker expects OK, and exits FrontendLoop otherwise.

Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
Signed-off-by: Peter Štibraný <peter.stibrany@grafana.com>
@pstibrany
Copy link
Contributor Author

@pracucci Thanks a lot for your comprehensive review! I've addressed most of your feedback. In some cases GitHub UI doesn't allow me to add replies, so I've commented via Slack (feel free to share here if GH isn't misbehaving). I can address remaining feedback (integration test, grpc keep-alive) early next week, or feel free to take over if urgent.

Thanks!

Signed-off-by: Marco Pracucci <marco@pracucci.com>
Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @pstibrany for promptly addressing feedback! I've noted down (and shared to you) the list of remaining cleanup / enhancements / doc to do, so we can split the work and get this work "done done".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Query Frontend scalability
4 participants