-
Notifications
You must be signed in to change notification settings - Fork 3.9k
kvserver,changefeeds,crosscluster: set per-consumer catchup scan limit #133789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver,changefeeds,crosscluster: set per-consumer catchup scan limit #133789
Conversation
ac02413
to
c4d2d02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: I will use GitHub review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mod our slack exchange about multiple stores, up to you how you want to handle it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small request
pkg/server/node.go
Outdated
settings.SystemOnly, | ||
"kv.rangefeed.per_consumer_catchup_scan_limit.enabled", | ||
"if enabled, rangefeed catchup scans are limited on a per consumer basis", | ||
true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In DR weekly with @dt, we are of the opinion that this should merge with default = false, and keep this in our back pockets in case we need it to bail us out of a production situation. @stevendanna does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks for flagging, I meant this to be disabled by default. I've grown very skeptical of this change over the last few days of investigations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've grown very skeptical of this change over the last few days of investigations.
Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate?
Overall, this change doesn't feel very risky to me and may be a useful mitigation for some users; but, it comes at the cost of adding another layer of confusion. There are currently 2 rate limiters and CPU pacing to consider when thinking about a backlog of catchup scans. Recently, it took multiple CRDB developers a non-negligible amount of time to fully understand that the joint behaviour of the existing rate limiters.
Further, we found that since the introduction of these rate limiters, the impact of catch-up-scans on foreground workloads has improved dramatically. We recently ran an experiment where we set both limits to 35k and saw only minor impacts on the foreground workload. This isn't an argument for removing the limits immediately (since there are other problems with having up to 35k iterators open), but it is perhaps just more motivation to re-visit the overall structure of the catch-up scan process and the limits around it.
51ba893
to
33bddd1
Compare
@sumeerbhola Wondering if you have time for a second look here. @ajstorm Let me know if the cluster setting updates work for you. |
Currently, it is easily possible for a single slow rangefeed consumer to acquire the entire catchup scan quota for a given store, preventing any other consumers from advancing. Here, we introduce the concept of a ConsumerID into the rangefeed request. A ConsumerID is that it represents a logical rangefeed consumer such as a changefeed or LDR stream. Such consumers may make multiple MuxRangeFeed requests to a given node despite sharing the same downstream consumer. When per-consumer catchup scan limiting is enabled, no single consumer is allowed to consumer more than 75% of a given store's capacity. If no ConsumerID is specified, a random consumer ID is assigned to all rangefeeds originating from a given MuxRangeFeed call. In the long run, we need a more sophisticated approach to solve this. This change is aimed to be a small improvement that solves the most egregious case: a single slow consumer consuming the entire quota. The goal of this change is an easily backportable feature, however, it comes at the cost of adding yet-another mechanism to the existing systems attempting to limit catchup scans: 1. Client-side rate limiter, 2. Store-level CatchupIter semaphore, 3. Elastic CPU rate limiting in the main CatchUpScan loop, and 4. Any limiting imposed now or in the future by virtue of including and admission header in the request. Informs cockroachdb#132438 Epic: none Release note: None
33bddd1
to
47e6b28
Compare
bors r+ |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 47e6b28 to blathers/backport-release-24.2-133789: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.2.x failed. See errors above. error creating merge commit from 47e6b28 to blathers/backport-release-24.3-133789: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.3.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Previously, we introduced the concept of a consumer ID to prevent a single changefeed job from over-consuming the catch-up scan quota and blocking other consumers from making progress on the server side. However, the changefeed client-side code requires the consumer ID to be passed again in the rangefeed options during rangefeedFactory.Run. This was missing in the previous PR, causing the changefeed job ID to default to zero. This patch fixes the issue by ensuring the consumer ID is correctly passed in the rangefeed options. Related: cockroachdb#133789 Release note: None
137916: roachprod: reduce start-up lag r=tbg a=tbg On my machine (which is in Europe), this brings `time roachprod --help` from `1.56s` down to to `0.06s` under the following env vars: ``` ROACHPROD_DISABLE_UPDATE_CHECK=true ROACHPROD_DISABLED_PROVIDERS=azure ROACHPROD_SKIP_AWSCLI_CHECK=true ``` Under these env vars, my roachprod - no longer invokes `aws --version` on each start (python, ~400ms) - no longer inits azure, which is >1s for me - doesn't list the gs bucket to check for a newer roachprod binary (~800ms; doesn't exist for OSX anyway). A better way (but one outside of my purview) for most of these would be to add caching for each of these and so to avoid the cost in the common case. Azure is an exception, as the (wall-clock) profile below shows we're spending most of our time waiting for `GetTokenFromCLIWithParams` to return. It's not clear how to optimize this. (The AWS portion of the flamegraph is `aws --version`).  Epic: none 138283: DEPS: upgrade grpc to v1.57.2 r=tbg a=tbg See #136278 (comment). `grpc` has gotten a little worse at allocations, but it's overall similarly fast, perhaps even a little faster in the smaller RPCs we care most about. <details><summary>Benchmark results</summary> <p> ``` $ benchdiff --old lastmerge ./pkg/rpc -b -r 'BenchmarkGRPCPing' -d 1s -c 10 old: 3ce8f44 Merge #138561 #138779 #138793 new: 3708ee5 DEPS: add resolve hints and update packages name old time/op new time/op delta GRPCPing/bytes=____256/rpc=UnaryUnary-24 126µs ± 3% 124µs ± 2% -1.59% (p=0.035 n=9+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 126µs ± 3% 124µs ± 1% -1.32% (p=0.011 n=10+10) GRPCPing/bytes=______1/rpc=UnaryUnary-24 124µs ± 4% 123µs ± 3% ~ (p=0.315 n=10+10) GRPCPing/bytes=______1/rpc=StreamStream-24 70.3µs ± 3% 70.8µs ± 2% ~ (p=0.393 n=10+10) GRPCPing/bytes=____256/rpc=StreamStream-24 74.5µs ± 3% 75.1µs ± 2% ~ (p=0.105 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 123µs ± 6% 120µs ± 4% ~ (p=0.661 n=10+9) GRPCPing/bytes=___1024/rpc=StreamStream-24 67.4µs ± 8% 67.4µs ± 6% ~ (p=0.720 n=10+9) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 133µs ± 5% 133µs ± 4% ~ (p=0.986 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 73.9µs ± 1% 74.6µs ± 2% ~ (p=0.234 n=8+8) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 150µs ± 2% 151µs ± 3% ~ (p=0.182 n=9+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 97.4µs ±10% 95.3µs ±10% ~ (p=0.393 n=10+10) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 175µs ± 1% 176µs ± 2% ~ (p=0.720 n=9+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 252µs ± 1% 253µs ± 1% ~ (p=0.315 n=9+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 190µs ± 1% 189µs ± 2% ~ (p=0.497 n=9+10) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 363µs ± 1% 366µs ± 1% ~ (p=0.079 n=10+9) GRPCPing/bytes=__32768/rpc=StreamStream-24 305µs ± 3% 305µs ± 1% ~ (p=0.579 n=10+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 512µs ± 2% 515µs ± 1% ~ (p=0.095 n=9+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 449µs ± 1% 452µs ± 1% ~ (p=0.059 n=9+8) GRPCPing/bytes=_262144/rpc=UnaryUnary-24 1.48ms ± 3% 1.48ms ± 2% ~ (p=0.739 n=10+10) GRPCPing/bytes=_262144/rpc=StreamStream-24 1.42ms ± 1% 1.41ms ± 2% ~ (p=0.182 n=9+10) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 5.90ms ± 2% 5.86ms ± 1% ~ (p=0.278 n=10+9) GRPCPing/bytes=1048576/rpc=StreamStream-24 5.81ms ± 2% 5.84ms ± 3% ~ (p=0.631 n=10+10) name old speed new speed delta GRPCPing/bytes=____256/rpc=UnaryUnary-24 4.44MB/s ± 3% 4.51MB/s ± 2% +1.58% (p=0.033 n=9+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 130MB/s ± 3% 132MB/s ± 1% +1.32% (p=0.010 n=10+10) GRPCPing/bytes=______1/rpc=UnaryUnary-24 386kB/s ± 4% 391kB/s ± 3% ~ (p=0.378 n=10+10) GRPCPing/bytes=______1/rpc=StreamStream-24 682kB/s ± 3% 676kB/s ± 2% ~ (p=0.189 n=10+9) GRPCPing/bytes=____256/rpc=StreamStream-24 7.52MB/s ± 3% 7.46MB/s ± 2% ~ (p=0.100 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 17.1MB/s ± 6% 17.4MB/s ± 4% ~ (p=0.645 n=10+9) GRPCPing/bytes=___1024/rpc=StreamStream-24 31.1MB/s ± 8% 31.1MB/s ± 6% ~ (p=0.720 n=10+9) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 31.1MB/s ± 5% 31.2MB/s ± 4% ~ (p=0.986 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 56.1MB/s ± 1% 55.6MB/s ± 2% ~ (p=0.224 n=8+8) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 55.1MB/s ± 2% 54.6MB/s ± 3% ~ (p=0.189 n=9+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 85.1MB/s ±11% 87.0MB/s ±11% ~ (p=0.393 n=10+10) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 93.7MB/s ± 1% 93.5MB/s ± 2% ~ (p=0.720 n=9+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 130MB/s ± 1% 130MB/s ± 1% ~ (p=0.305 n=9+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 173MB/s ± 1% 173MB/s ± 2% ~ (p=0.497 n=9+10) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 180MB/s ± 1% 179MB/s ± 1% ~ (p=0.079 n=10+9) GRPCPing/bytes=__32768/rpc=StreamStream-24 215MB/s ± 2% 215MB/s ± 1% ~ (p=0.579 n=10+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 256MB/s ± 2% 255MB/s ± 1% ~ (p=0.095 n=9+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 292MB/s ± 1% 290MB/s ± 1% ~ (p=0.059 n=9+8) GRPCPing/bytes=_262144/rpc=UnaryUnary-24 353MB/s ± 3% 353MB/s ± 2% ~ (p=0.447 n=10+9) GRPCPing/bytes=_262144/rpc=StreamStream-24 369MB/s ± 1% 371MB/s ± 2% ~ (p=0.182 n=9+10) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 355MB/s ± 2% 358MB/s ± 1% ~ (p=0.278 n=10+9) GRPCPing/bytes=1048576/rpc=StreamStream-24 361MB/s ± 2% 359MB/s ± 3% ~ (p=0.631 n=10+10) name old alloc/op new alloc/op delta GRPCPing/bytes=______1/rpc=UnaryUnary-24 16.9kB ± 1% 16.9kB ± 3% ~ (p=0.579 n=10+10) GRPCPing/bytes=____256/rpc=UnaryUnary-24 19.8kB ± 2% 19.9kB ± 2% ~ (p=0.755 n=10+10) GRPCPing/bytes=____256/rpc=StreamStream-24 7.35kB ± 2% 7.43kB ± 2% ~ (p=0.052 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 29.8kB ± 2% 29.8kB ± 1% ~ (p=0.853 n=10+10) GRPCPing/bytes=___1024/rpc=StreamStream-24 17.7kB ± 1% 17.7kB ± 1% ~ (p=0.796 n=10+10) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 43.2kB ± 1% 43.0kB ± 1% ~ (p=0.218 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 31.0kB ± 0% 31.1kB ± 1% ~ (p=0.278 n=9+10) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 73.0kB ± 1% 73.2kB ± 1% ~ (p=0.393 n=10+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 61.6kB ± 1% 61.7kB ± 0% ~ (p=0.573 n=10+8) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 127kB ± 0% 127kB ± 1% ~ (p=0.393 n=10+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 118kB ± 1% 118kB ± 0% ~ (p=0.796 n=10+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 237kB ± 1% 237kB ± 1% ~ (p=0.579 n=10+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 227kB ± 1% 227kB ± 1% ~ (p=0.481 n=10+10) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 500kB ± 1% 500kB ± 1% ~ (p=0.912 n=10+10) GRPCPing/bytes=__32768/rpc=StreamStream-24 492kB ± 0% 492kB ± 0% ~ (p=0.968 n=9+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 873kB ± 0% 872kB ± 0% ~ (p=0.780 n=9+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 868kB ± 0% 868kB ± 0% ~ (p=1.000 n=9+9) GRPCPing/bytes=_262144/rpc=UnaryUnary-24 3.50MB ± 0% 3.51MB ± 0% ~ (p=0.436 n=10+10) GRPCPing/bytes=_262144/rpc=StreamStream-24 3.49MB ± 0% 3.50MB ± 0% ~ (p=0.436 n=10+10) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 13.5MB ± 0% 13.5MB ± 0% ~ (p=0.515 n=8+10) GRPCPing/bytes=1048576/rpc=StreamStream-24 13.5MB ± 0% 13.5MB ± 0% ~ (p=0.549 n=10+9) GRPCPing/bytes=______1/rpc=StreamStream-24 4.08kB ± 3% 4.18kB ± 3% +2.28% (p=0.008 n=9+10) name old allocs/op new allocs/op delta GRPCPing/bytes=_262144/rpc=UnaryUnary-24 282 ± 4% 286 ± 4% ~ (p=0.223 n=10+10) GRPCPing/bytes=_262144/rpc=StreamStream-24 147 ± 3% 149 ± 3% ~ (p=0.053 n=9+8) GRPCPing/bytes=1048576/rpc=UnaryUnary-24 510 ± 2% 513 ± 3% ~ (p=0.656 n=8+9) GRPCPing/bytes=1048576/rpc=StreamStream-24 370 ± 6% 377 ± 3% ~ (p=0.168 n=9+9) GRPCPing/bytes=____256/rpc=UnaryUnary-24 183 ± 0% 184 ± 0% +0.71% (p=0.000 n=8+10) GRPCPing/bytes=______1/rpc=UnaryUnary-24 183 ± 0% 184 ± 0% +0.77% (p=0.000 n=10+8) GRPCPing/bytes=__32768/rpc=UnaryUnary-24 211 ± 0% 213 ± 0% +0.95% (p=0.000 n=10+10) GRPCPing/bytes=__16384/rpc=UnaryUnary-24 195 ± 0% 197 ± 0% +1.03% (p=0.000 n=10+10) GRPCPing/bytes=___8192/rpc=UnaryUnary-24 184 ± 0% 186 ± 0% +1.09% (p=0.000 n=10+10) GRPCPing/bytes=___2048/rpc=UnaryUnary-24 183 ± 0% 185 ± 0% +1.09% (p=0.000 n=10+10) GRPCPing/bytes=___4096/rpc=UnaryUnary-24 183 ± 0% 185 ± 0% +1.09% (p=0.000 n=10+10) GRPCPing/bytes=___1024/rpc=UnaryUnary-24 182 ± 0% 184 ± 0% +1.10% (p=0.000 n=10+10) GRPCPing/bytes=__65536/rpc=UnaryUnary-24 219 ± 0% 221 ± 0% +1.10% (p=0.000 n=10+8) GRPCPing/bytes=__32768/rpc=StreamStream-24 75.0 ± 0% 77.0 ± 0% +2.67% (p=0.000 n=10+10) GRPCPing/bytes=__65536/rpc=StreamStream-24 83.0 ± 0% 85.3 ± 1% +2.77% (p=0.000 n=9+10) GRPCPing/bytes=__16384/rpc=StreamStream-24 57.0 ± 0% 59.0 ± 0% +3.51% (p=0.000 n=10+10) GRPCPing/bytes=___8192/rpc=StreamStream-24 51.0 ± 0% 53.0 ± 0% +3.92% (p=0.000 n=10+10) GRPCPing/bytes=___4096/rpc=StreamStream-24 49.0 ± 0% 51.0 ± 0% +4.08% (p=0.000 n=10+10) GRPCPing/bytes=___2048/rpc=StreamStream-24 48.0 ± 0% 50.0 ± 0% +4.17% (p=0.000 n=10+10) GRPCPing/bytes=______1/rpc=StreamStream-24 47.0 ± 0% 49.0 ± 0% +4.26% (p=0.000 n=10+10) GRPCPing/bytes=____256/rpc=StreamStream-24 47.0 ± 0% 49.0 ± 0% +4.26% (p=0.000 n=10+10) GRPCPing/bytes=___1024/rpc=StreamStream-24 47.0 ± 0% 49.0 ± 0% +4.26% (p=0.000 n=10+10) ``` </p> </details> Epic: None Release note: None 138939: changefeedccl/kvfeed: pass consumer id correctly r=andyyang890,stevendanna a=wenyihu6 Previously, we introduced the concept of a consumer ID to prevent a single changefeed job from over-consuming the catch-up scan quota and blocking other consumers from making progress on the server side. However, the changefeed client-side code requires the consumer ID to be passed again in the rangefeed options during rangefeedFactory.Run. This was missing in the previous PR, causing the changefeed job ID to default to zero. This patch fixes the issue by ensuring the consumer ID is correctly passed in the rangefeed options. Related: #133789 Release note: none Epic: none Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Wenyi Hu <wenyi@cockroachlabs.com>
Previously, we introduced the concept of a consumer ID to prevent a single changefeed job from over-consuming the catch-up scan quota and blocking other consumers from making progress on the server side. However, the changefeed client-side code requires the consumer ID to be passed again in the rangefeed options during rangefeedFactory.Run. This was missing in the previous PR, causing the changefeed job ID to default to zero. This patch fixes the issue by ensuring the consumer ID is correctly passed in the rangefeed options. Related: cockroachdb#133789 Release note: None
Currently, it is easily possible for a single slow rangefeed consumer to acquire the entire catchup scan quota for a given store, preventing any other consumers from advancing.
In the long run, we need a more sophisticated approach to solve this. This change is aimed to be a small improvement that solves the most egregious case: a single slow consumer consuming the entire quota.
It introduces the concept of a ConsumerID into the rangefeed request. The idea of a ConsumerID is that it represents a logical rangefeed consumer such as a changefeed or LDR stream. Such consumers may make multiple MuxRangeFeed requests to a given node despite sharing the same downstream consumer.
When per-consumer catchup scan limiting is enabled, no single consumer is allowed to consumer more than 75% of a given store's capacity. If no ConsumerID is specified, a random consumer ID is assigned to all rangefeeds originating from a given MuxRangeFeed call.
Epic: none
Release note: None