-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: remove below-raft throttling #57247
Comments
Hi @sumeerbhola, please add a C-ategory label to your issue. Check out the label system docs. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
cc @cockroachdb/bulk-io |
@cockroachdb/repl discussed this a bit last week. We definitely need to avoid below-Raft throttling while holding the Raft mutex. Posting some preliminary thoughts below, nothing concrete yet. We think that the store-level throttling likely happens a bit early -- it seems like the ideal throttling point would be right as the request is about to acquires latches in the concurrency manager. Requests can be hanging around for a long time between traversing the store and being evaluated, e.g. because they're waiting for latches. And throttling later than that basically means that we'll have to choose between head-of-line blocking (because we acquired latches), reordering problems (because we've already evaluated the request), or request reevaluation (to handle reordering). We also felt like a distributed throttling scheme to take into account the health of follower stores would likely be too complex (and possibly more in the purview of the allocator, which should move replicas off of hot nodes). On followers, we can often defer command application as long as commands get committed to the log, which would allow us to do some level of local Raft throttling. But we would have to do this throttling without holding onto a scheduler goroutine, by returning early and backing off. We'll also need to keep other throttling mechanisms in mind here, e.g. admission control and the proposal quota pool. |
Note to self: if we were to do store-level throttling based on the state of other replica stores, we should consider using the existing store gossip mechanism: cockroach/pkg/kv/kvserver/store.go Lines 2388 to 2389 in 4422323
cockroach/pkg/roachpb/metadata.proto Lines 392 to 401 in e1f30c6
Remote store info is maintained in cockroach/pkg/kv/kvserver/store_pool.go Lines 397 to 398 in 24e3c4a
|
#74254 is related. |
Store read amp is gossiped as of #77040. |
Updating this issue to link it properly into the #79215. |
#75066 (comment) is also concerned with removing below-raft throttling, so closing this one. |
**This is for 23.2 only** We shouldn't artificially delay SST ingestion below raft because this exacerbates memory pressure (cockroachdb#81834). Anecdotally I see in my "typical experiment" (restores under I/O bandwidth constraints) that `PreIngestDelay` is mostly fairly small compared to the slowness that comes from write bandwidth overload itself, so at least in those experiments removing this has little to no effect. As we are also working on replication admission control[^1] and are looking into improving the memory footprint under unchecked overload[^2] now's a good time to rip this out as we'll be in a good place to address any detrimental fallout from doing so. [^1]: cockroachdb#95563 [^2]: cockroachdb#98576 Touches cockroachdb#81834. Touches cockroachdb#57247. Epic: none Release note: None
We shouldn't artificially delay SST ingestion below raft because this exacerbates memory pressure (cockroachdb#81834). Anecdotally I see in my "typical experiment" (restores under I/O bandwidth constraints) that `PreIngestDelay` is mostly fairly small compared to the slowness that comes from write bandwidth overload itself, so at least in those experiments removing this has little to no effect. As we are also working on replication admission control[^1] and are looking into improving the memory footprint under unchecked overload[^2] now's a good time to rip this out as we'll be in a good place to address any detrimental fallout from doing so. [^1]: cockroachdb#95563 [^2]: cockroachdb#98576 Touches cockroachdb#81834. Touches cockroachdb#57247. Epic: none Release note: None
We currently throttle when adding sstables to the LSM due to background bulk operations. This throttling happens in two places, in
Store.Send
, prior to the proposal, and inaddSSTablePreApply
, when applying to the state machine at each replica. The latter is problematic in that it (a) delays applying later things in the raft log at the replica, and (b) blocks a worker on the raft scheduler, which can impact other ranges.The
Store.Send
throttling only looks at the store health on the leaseholder, and we have seen occurrences where one node (or a few nodes) in a cluster has an unhealthy LSM store. If we maintained soft state in a node about the health of other stores in the system (this state would be slow changing), we could use it inStore.Send
and remove the apply-time throttling.This store health information can also be useful for throttling in the GC queue.
Jira issue: CRDB-2847
Epic CRDB-15069
The text was updated successfully, but these errors were encountered: