-
Notifications
You must be signed in to change notification settings - Fork 820
Fix queriers shuffle-sharding blast radius containment #3901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix queriers shuffle-sharding blast radius containment #3901
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, looking forward to see the tests. My comments are mostly some grammar suggestions... if you find these annoying, let me know and I will stop. (Some may also be wrong, so... :))
pkg/scheduler/queue/user_queues.go
Outdated
querierConnections map[string]int | ||
// How long to wait before removing a querier which has got disconnected | ||
// but hasn't notified a graceful shutdown. | ||
forgetTimeout time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'd suggest to move forgetTimeout
to user_queues.go
, and pass only threshold to forgetDisconnectQueriers
. removeQuerierConnection
is also using this field, but only to know whether to keep disconnected querier or not. We could pass that information as a bool to removeQuerierConnection
. This would simplify queues
a bit, and allow for easier testing of forgetDisconnectQueriers
. I would also suggest passing time.Now()
as parameter to removeQuerierConnection
, to avoid having dependency on current time in queues
. WDYT?
2ccc0d8
to
03cb76f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me functionally. I had a question about alternative naming of the configuration value being added, but otherwise LGTM.
2732880
to
4ec324d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this and the changes.
CHANGELOG.md
Outdated
@@ -91,6 +91,7 @@ | |||
* `cortex_bucket_store_chunk_pool_returned_bytes_total` | |||
* [ENHANCEMENT] Alertmanager: load alertmanager configurations from object storage concurrently, and only load necessary configurations, speeding configuration synchronization process and executing fewer "GET object" operations to the storage when sharding is enabled. #3898 | |||
* [ENHANCEMENT] Blocks storage: Ingester can now stream entire chunks instead of individual samples to the querier. At the moment this feature must be explicitly enabled either by using `-ingester.stream-chunks-when-using-blocks` flag or `ingester_stream_chunks_when_using_blocks` (boolean) field in runtime config file, but these configuration options are temporary and will be removed when feature is stable. #3889 | |||
* [ENHANCEMENT] Query-frontend/scheduler: added querier forget delay (`-query-frontend.querier-forget-delay` and `-query-scheduler.querier-forget-delay`) to mitigate the blast radius in the event queriers crash because of a repeatedly sent "query of death" when shuffle-sharding is enabled. #3901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cortex release 1.8.0 is now in progress. Could you please rebase master and move the CHANGELOG entry under the master / unreleased section?
68d9eca
to
8b915a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, great work.
querierConnections map[string]int | ||
// How long to wait before removing a querier which has got disconnected | ||
// but hasn't notified about a graceful shutdown. | ||
forgetDelay time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I would prefer to move forgetDelay
out of queues
struct, and pass forgetTime
(when to forget given querier, or zero time if immediately) to removeQuerierConnection
and only threshold
to forgetDisconnectedQueriers
. It's a small change, but helps to minimize change to the queues
struct, which is already tricky.
04220de
to
1a2582d
Compare
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…ting a querier Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
1a2582d
to
1ca9855
Compare
Signed-off-by: Marco Pracucci <marco@pracucci.com>
…#3901) * Added forget timeout support to queues Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added notify shutdown rpc to query-frontend and query-scheduler proto Signed-off-by: Marco Pracucci <marco@pracucci.com> * Querier worker notifies shutdown to query-frontend/scheduler Signed-off-by: Marco Pracucci <marco@pracucci.com> * Log when query-frontend/scheduler receives a shutdown notification Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added config option to configure the forget timeout Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed re-connect while in forget waiting period Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed unit tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Fixed GetNextRequestForQuerier() when a resharding happen after fogetting a querier Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/frontend/v1/frontend.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Update pkg/scheduler/queue/user_queues.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Update pkg/scheduler/queue/user_queues.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Update pkg/scheduler/scheduler.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Update pkg/querier/worker/frontend_processor.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Updated comment based on review feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated comment based on review feedback Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated generated doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added name to services Signed-off-by: Marco Pracucci <marco@pracucci.com> * Moved forgetCheckPeriod where it's used Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added queues forget timeout unit tests Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added RequestQueue unit test Signed-off-by: Marco Pracucci <marco@pracucci.com> * Renamed querier forget timeout into delay Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added timeout to the notify shutdown notification Signed-off-by: Marco Pracucci <marco@pracucci.com> * Updated doc Signed-off-by: Marco Pracucci <marco@pracucci.com> * Added CHANGELOG entry Signed-off-by: Marco Pracucci <marco@pracucci.com> * Update pkg/scheduler/scheduler.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Update pkg/frontend/v1/frontend.go Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com> * Updated doc Signed-off-by: Marco Pracucci <marco@pracucci.com> Co-authored-by: Peter Štibraný <pstibrany@gmail.com>
What this PR does:
As described in the #3571, when shuffle-sharding is enabled in the query-frontend/scheduler and a querier crashes, tenants are immediately resharded across the remaining queriers. This practically invalidates the assumption that shuffle-sharding can be used to contain the blast radius in case of a "poisoned query" on the read path: if a tenant repeatedly send a poisoned query over and over it has the ability to crash all queriers, and not just its shard.
In this PR I propose a solution to mitigate it, introducing a delay ("forget delay") between when a querier disconnects because of a crash and when a tenant's shard changes because of that.
To do it, a query-frontend/scheduler needs to know when a querier disconnects because of crash. I've introduced a "graceful shutdown notification" from the querier to query-frontend/scheduler: when a querier disconnects from the query-frontend/scheduler without sending such notification it means the querier crashed / abruptly terminated.
I've done some manual testing and looks working as expected.
Which issue(s) this PR fixes:
Fixes #3571
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]