-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: aggressive gc assist with many goroutines #56966
Comments
Thank you! AFAICT what's going on here is that the pacer is built to schedule riiiight on the edge of no assists in the steady-state, to avoid starting earlier than it absolutely has to. However, when there's some noise (I see that the amount of goroutine stacks vary a good bit, but there's likely also other noise) the pacer may constantly be either ahead or behind of where it needs to be. The amount of assists here is not super high, it's just that it hovers around 2-7%. There are two ways we can fix this. (1) is to bound the latency of assists, but that would require a bigger change. (2) is to modify the pacer to always start a little early hedge against noise. This could make assists consistently 0% of GC CPU time, but comes at the cost of a slightly longer GC cycle. This might lead to a slightly higher frequency of GC cycles in some cases, slightly more write barriers executed, and slightly more memory used for floating garbage. However, I suspect that the amount of hedging required to resolve this issue is small enough that it won't meaningfully change performance otherwise (especially when things aren't very noisy). My main idea is to have the pacer always use a slightly higher cons/mark ratio. For example, instead of taking the average of the last 2 cycles as a prediction for the next one like it currently does, take the maximum of the two, or weight the larger cons/mark more heavily. (This might be too noisy, and maybe we want to track 3 GC cycles or something, but it's a decent starting point.) |
Thanks @mknyszek! Hedging against noise to avoid GC assist makes sense to me. There's a high, direct penalty to starting late and then pushing assist work onto foreground goroutines. Meanwhile, there's a lower, indirect penalty to starting early and running GC for slightly longer. There's a tension here between reliable/predictable performance and absolute performance. Different systems will want to live at different spots on that spectrum. I'm hopeful a small degree of hedging will meaningfully move us in the direction of predictability without compromising much on absolute performance. I'm happy to test out fixes on this workload to see if they help. |
Sorry for the delay! I got a bit caught up with other things, and after some more thought and a little experimentation, I think indeed this idea
is indeed too noisy.
I agree. I suspect aggregate performance metrics like throughput tend to be less sensitive to relatively small changes in pacing, so a pure win (or close to one) feels achievable here. I'm going to be out on vacation for a few days, just wanted to leave a ping here that I hadn't forgotten about it, and that it's still on my radar. :)
Thank you! I'll ping you when I have something more concrete. I have a couple more ideas I'm playing with, I'll aim to get back to you next week (I'm out for the rest of this week). |
Thanks for the update @mknyszek. Enjoy your vacation! |
Okay, it's been a full month and I'm back. I experimented with a whole lot of things (The test case I'm using here is uniformly random noise in the measured cons/mark ratio), and I think the best hedging mechanism is in fact a maximum, it's just that a 2 cycle window is pretty bad. A 5 cycle window is pretty good, but it's not very responsive when the application suddenly goes idle; that's 5 consecutive cycles where the pacer is going to be too conservative. Maybe that's fine though. When you're idle these things tend to not matter so much, and it's a constant factor hit. I'm going to try benchmarking this ASAP; I'll send a CL if it looks OK. |
Sorry again for the delay. I was out sick all last week, and I'm on release rotation this week. I'll try to find some time to actually run the experiment. I'm currently trying to put together a few ideas to avoid "pacing on the edge," which this is certainly a symptom of. |
@mknyszek thanks for the update! Please let me know if you'd like help testing out your ideas that avoid pacing on the edge. We can pretty easily reproduce this behavior (see above) and are happy to run the same tests in our environment with patches to the runtime. |
Change https://go.dev/cl/467875 mentions this issue: |
I put up https://go.dev/cl/467875 if you'd like to give it a try. Sigh... such a long time for such a small diff... 😅 Though, I think I fully reasoned out the consequences, and I feel more confident that this is a good solution. I'll run some benchmarks overnight, too. |
The benchmarks show that this CL moves the needle very little. This isn't necessarily a bad thing, because I don't believe any of these benchmarks are particularly noisy to the pacer. Where the needle did move, it moved roughly in the direction I expected, which is a good sign. If you can patch in that CL and try a run with it, I'd appreciate it! |
Friendly ping! :) |
Hi @mknyszek, I took the patch in https://go.dev/cl/467875 for a spin and it is showing a meaningful improvement to tail latency on the memory-intensive workload from above! CockroachDB has not yet upgraded to Go 1.20, so running with the patch was a little tricky. I was able to do so by cherry-picking 4179552 (an earlier GC assist-related change only present in go1.20) onto go1.19.4. I then built one version of CockroachDB with just that earlier patch and one that also contains your new patch, https://go.dev/cl/467875. Here are these results, averaged across a series of tests:
Here is a pair of gctraces from the two different variants of the test using |
That's great to hear! I mean, it definitely isn't at the point of "no assists" but at least worth moving forward with for now. There's clearly more denoising to be done, but at this point it looks like most of the pacer inputs and their derivatives are quite stable from the latest GC traces you shared (thanks!). This leads me to be suspicious of noisiness during the GC itself causing the assists to be "jumpy." I realize in the execution trace you showed in your original post the assists were pretty uniformly applied to all goroutines, but now I'm wondering if post-patch (the patch mostly addresses larger-scale, aggregate behavior) the assist ratio during a GC looks more spiky over time (which should cause transient global assist spikes). I have a couple of follow-up questions in this direction (in no particular order of priority):
Also, my apologies about the ergonomics of that patch. I can prepare any future patches on top of Go 1.19 and later port them to tip. |
Currently the pacer is designed to pace against the edge. Specifically, it tries to find the sweet spot at which there are zero assists, but simultaneously finishes each GC perfectly on time. This pretty much works, despite the noisiness of the measurement of the cons/mark ratio, which is central to the pacer's function. (And this noise is basically a given; the cons/mark ratio is used as a prediction under a steady-state assumption.) Typically, this means that the GC might assist a little bit more because it started the GC late, or it might execute more GC cycles because it started early. In many cases the magnitude of this variation is small. However, we can't possibly control for all sources of noise, especially since some noise can come from the underlying system. Furthermore, there are inputs to the measurement that have effectively no restrictions on how they vary, and the pacer needs to assume that they're essentially static when they might not be in some applications (i.e. goroutine stacks). The result of high noise is that the variation in when a GC starts is much higher, leading to a significant amount of assists in some GC cycles. While the GC cycle frequency basically averages out in the steady-state in the face of this variation, starting a GC late has the significant drawback of reducing application latencies. This CL thus biases the pacer toward avoiding assists by picking a cons/mark smoothing function that takes the maximum measured cons/mark over 5 cycles total. I picked 5 cycles because empirically this was the best trade-off between window size and smoothness for a uniformly distributed jitter in the cons/mark signal. The cost here is that if there's a significant phase change in the application that makes it less active with the GC, then we'll be using a stale cons/mark measurement for 5 cycles. I suspect this is fine precisely because this only happens when the application becomes less active, i.e. when latency matters less. Another good reason for this particular bias is that even though the GC might start earlier and end earlier on average, resulting in more frequent GC cycles and potentially worse throughput, it also means that it uses less memory used on average. As a result, there's a reasonable workaround in just turning GOGC up slightly to reduce GC cycle frequency and bringing memory (and hopefully throughput) levels back to the same baseline. Meanwhile, there should still be fewer assists than before which is just a clear improvement to latency. Lastly, this CL updates the GC pacer tests to capture this bias against assists and toward GC cycles starting earlier in the face of noise. Sweet benchmarks didn't show any meaningful difference, but real production applications showed a reduction in tail latencies of up to 45%. Updates #56966. Change-Id: I8f03d793f9a1c6e7ef3524d18294dbc0d7de6122 Reviewed-on: https://go-review.googlesource.com/c/go/+/467875 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com>
As a heads-up, I landed the change you tried out earlier for Go 1.21. I moved this out of the Go 1.21 milestone for now, but I'm happy to do more here if you have time and/or are able to collect more data on what's going on during the GC cycle (as per my last comment). Thanks! |
Hi @mknyszek, we've recently upgraded CockroachDB to Go 1.21 and wanted to share that we are seeing significantly better tail latency as a result. For example, here's the output of a recent test we ran before and after the upgrade with a pathological workload (many goroutines, reasonably high rate of allocation):
Thank you for your help on this improvement! At the same time, we've also recently begun to identify just how dominant of a source of tail latency the Go GC still is in CockroachDB. Specifically, how impactful gc assist can be under steady-state and when running background tasks like backups or bulk writes. We added a debug environment variable called We also measured the effect of adjusting This raises the question of what role gc assist plays in an application which has also configured a GC assist is a backpressure mechanism to prevent an unexpectedly fast rate of user goroutine memory allocation from exceeding memory limits and causing OOMs. Having such a mechanism feels critical, and other concurrent GCs like the JVM's CMS collector have similar concepts. On the other hand, it strikes me that gc assist in Go is a fairly expected consequence of GC. The JVM's CMS collector views foreground allocation backpressure (called a "concurrent mode failure") a little more similarly to how latency-sensitive applications do — as an exceptional case that should be avoided ("These estimates are padded for safety, because concurrent mode failure can be very costly.") I'm interested to get your thoughts on how to think about all of this. Many of the implementation choices of the GC pacing algorithm seem to designed around some amount of assist. For example, This seems to be optimizing for starting the GC as late as possible and running the fewest GCs as possible. For throughput-oriented applications, that is probably the right choice. For latency-sensitive applications, minimizing GC assist would be a more important goal. For example, a latency-sensitive application might want to start each GC cycle GBs of heap earlier than necessary just to minimize the chance of any assist work penalizing latency, with the expectation that assist would only kick in at a last resort to avoid OOMs. Is there a missing knob here to allow applications to express their willingness to backpressure user goroutines? Or a knob to configure how much each cons/mark estimate should be padded to trade earlier GCs for less assist? Is the GC pacing algorithm written in a way that kicking off GC earlier would even help avoid assist? Independent of the discussion here, I'm happy to provide reproductions (outside of cockroach) which perform similar heap allocation patterns to cockroach and demonstrate high levels of assist. For example, I've attached one such reproduction below, which commonly shows goroutines slow down by 40-50x (from ~250us to >10ms) while GC is running due to assist. |
I'm happy that I could help. :)
This used to be true before Go 1.18 (#44167). The goal since then has very much been to try and avoid assists as much as possible. The steady-state goal for the pacer is zero assists (previously it was a target of 5%). However, I agree there's more that can be done. I think part of the problem is that assist scheduling hasn't been touched at all in response to that change. If we can improve that and broaden the definition of "steady state," then that'll reduce assists in even more situations. More on this below.
That's an unfortunate known issue that I don't think we have a good sense about how to solve in the current design. See #42430. We could start bgScanCredit with some non-zero value, but it's hard to decide what a good value is until the GC actually starts running. We could maybe look at the last cycle's data for a hint, but that's going to be noisy. It'll probably help, but it's unclear by what amount.
Yeah, unfortunately both of these mechanisms are susceptible to noise. Conceptually assistWorkPerByte represents the slope of a line that the runtime is trying to stay under, and it's pretty strict about staying under that line. Assists begin the moment you cross it. Part of the reason for this is that we want to smooth out assists across the entire GC cycle. This is counter-intuitive, but by having the line the GC can plan ahead. If it knows its going to have a short runway, then it'll always be under the line, but everything will be equally under the line. Consider situations like a low On the other hand, this line is intended to smooth out spiky behavior to keep the application on track. For example, if a background worker stalls in the OS for some reason, assists can pick up the slack without any overall GC CPU impact, and the GC still hits its goal on time. I've long wondered if we could do a lot better with some hysteresis: assume that we'll right ourselves in the near future, and only if that doesn't happen do we kick up assists. However, this conflicts with the goal of smearing assist work across the GC cycle, which is also good for latency. I don't think this is unresolvable, but it will require some thought.
Also, assist credit being goroutine-local was also an intentional design decision. The idea is that the goroutines that allocate the most are the ones that pay the biggest assist penalty. This policy is about making performance easier to locally reason about, though the GC is already one big global component and we've considered changing this policy. Additionally, it helps keep synchronization costs down, because memory allocation only needs to do a non-atomic local update, but we can flush to a global pool often enough to bound the amount of fragmentation. It would be nice to figure out how much of a problem the fragmentation causes in practice. TBH, I have a hunch that this goroutine-local property is a significant issue in your case. (It's not based on a lot of concrete evidence; more of a gut feeling.) The absolute worst case latency from assists is a frequently allocating goroutine repeatedly blocking in mark assist because they can't find GC work to do, but they're in debt. If some other goroutine could pay off its debt, or it could go in the red in the hope that someone else covers their debt, that would help a lot. But, it really depends on the application; we'd first have to confirm this is actually happening for CockroachDB.
I think the role of assists with That being said, if the GC has a lot of runway, then assists should not be kicking in in the steady state. The pacer shouldn't be doing that, and it represents some flaw in the system. Either in scheduling, or in the pacer itself. (At this point, I'm more likely to blame scheduling, but we need more data.)
There are an endless amount of knobs we could add, but unfortunately I don't think either of these are compelling enough to me. I'm afraid they would be overfitting to the current GC design. Instead, I think we could do better by rethinking how assists are scheduled in the first place to better fit the Go 1.18 "no assists in the steady state" policy while still retaining some of the existing properties, like decent smearing across the GC cycle.
Yes. In fact, that's what the pacer is fundamentally already doing. It starts the GC earlier to avoid assists, and before Go 1.21 tried to find the sweet spot of "minimal GC length" and "no assists". (Shorter GC cycles are also good for throughput, latency, and memory use, since write barriers are disabled and there's less 'floating garbage' that inflates the live heap size.) Unfortunately in real-world noisy environments, pacing for that sweet spot can result in bad guesses and spikes in assist when the GC is wrong. My patch for Go 1.21 specifically changes the pacer to be more conservative in its estimate of the cons/mark ratio. It is biased against assists in favor of longer GC mark phases, though it'll still hit the sweet spot in low-noise situations. One reason I have been fixated on assist scheduling in this reply is that the pacer, for the most part, should already be doing a good job at eliminating assist from its end. As of Go 1.21, the pacer should already be giving the rest of the runtime plenty of space to avoid assists from this perspective by starting the GC conservatively early. But fundamentally the fine-grained assist scheduling (i.e. when a goroutine actually calls into an assist, and how long it stays there) is still sensitive and has a really long worst case tail latency. (The one exception to "the pacer should be giving the rest of the runtime plenty of space" is phase changes, which is a different problem. However, from what I recall of your previous descriptions, I don't think that's the main thing going on here. If you are aware of latency spikes from assists that spawn around the time a background task starts running, that's probably this problem. That is less clear how to solve.)
Thanks. Would you also be willing to collect and share some execution traces? (What's produced by the Go 1.22 will release with a new tracer with partitioned traces, allowing us to analyze traces in a streaming manner. I don't know how difficult this would be for you to do, but taking a very long trace containing many GC cycles would help answer a lot of questions. |
This is a little off-topic but I think it's a good time to bring it up: it would be nice add this CockroachDB workload (and possibly others) to our benchmark suite that we run continuously against commits to tip. The suite is not very opinionated as long as the result ends up in the Go benchmark format and we can get basic diagnostics out of the program (CPU profiles, memory profiles, traces, etc.). It would be great for us to be running this continuously (https://perf.golang.org/dashboard/) on our side so we can watch for both improvements and regressions. Do you have any recommendations on what configurations are useful and/or representative to run and the best way to run them? I don't mind doing the work of integrating into the benchmark suite, that part is pretty easy. The biggest hurdle to adding new benchmarks is generally identifying representative workloads. Also, there's support in the benchmarking suite for seeding programs with interesting data sets (even if they're really big) and running multiple instances of a program generally isn't a problem either, as long as it can all run on the same machine. (I recognize leaving out network latency for something like a distributed database makes it less representative, but usually that just serves to magnify improvements or regressions in the compiler/runtime. That's more useful for spotting trends anyway.) Let me know what you think! |
Thank you @mknyszek for the thorough response, as always.
Is this because of the goal to smooth out assists across the entire GC cycle? One can imagine a different goal that trades average latency for tail latency (hopefully far into the tail). We would run with no assist-provided pacing until the heap goal is reached and then slam on the breaks with a stop-the-world pause in cases where the goal is reached before the cycle completes. In such a world, we would instead initialize There are clear downsides to that approach as well, many of which you alluded to in #44167, but I find it to be a useful comparison point. It hints at an intermediate approach where bgScanCredit starts with some non-zero value to avoid assist at the start of a GC cycle until the ratio between the rate of GC work and the rate of allocation is more clearly established. You can formulate this as giving the background GC workers a "head start" to reduce the chance that the allocations momentarily catch up and need to be slowed down. This is similar but not identical to the hysteresis approach you described to let the allocations/scan work rate temporarily exceed the sustainable rate but then pace more aggressively if the rates don't right themselves quickly.
This would line up with the observation that the larger the pool of goroutines that a fixed amount of work is distributed across, the worse the assist penalty appears to be on latency. Two different related observations I had which may be useful here (but probably aren't) are:
My questions could have been worded more clearly. In cases where the GOMEMLIMIT is the cause of the GC cycle and dictates the heap goal, I agree that gc assist pacing is integral to exceeding the limit and avoiding OOMs. However, in cases where the heap goal is dictated by GOGC and there's plenty of room between the GC cycle's heap goal and the soft memory limit, gc assist feels extra unfortunate. That's because in these cases where the heap goal is a consequence of GOGC and is much less than GOMEMLIMIT, it's a fairly arbitrary goal to the application. GC assist is pacing user allocations so that it doesn't exceed a heap goal when the application is fine with the heap growing up to GOMEMLIMIT. Maybe the right answer here is that GOGC should just be set to
For my own understanding, why does starting GC earlier lead to longer GC mark phases? My naive understanding is that it would lead to more frequent GC cycles, but that each cycle would take about the same time (assuming no assist in either case).
This makes sense, and I appreciate you writing everything out and building up to this conclusion. It's tempting to point at pacing as a solution to prevent assist, but if GC cycles are getting kicked off sufficiently early and yet allocations are still being scheduled to perform assist, then that's a problem with assist scheduling and we should try to understand what the cause is. I'm happy to help with that.
I'll start with the reproduction outside of cockroach that I provided above, even though I'm not certain it's the same exact behavior we see inside cockroach. I'm collecting these while running with go1.21.5 (bbab863) and with The runtime trace shows that during each background GC, user goroutines end up spending most of their time in a Goroutines do run, but only for short periods of time. Presumably until their next allocation, at which point they bounce off the gc |
We would be very interested in this! I see that etcd is already supported. I would imagine that benchmarking cockroach end-to-end would look similar. We also have plenty of Go microbenchmarks that we could expose. One challenge to this is that CockroachDB uses Bazel for compilation, which makes switching go versions a little tricky. |
I really like this framing and I find it very genuinely enlightening. It suggests to me that seeding the pool with some background credit proportional to the estimated total heap work is the right call. Say, somewhere between 1-5%. The theory is that we start with just enough credit so that the background workers perpetually keep us in the green, and assists always take from the pool without draining it. We could possibly even use that as a feedback mechanism to decide how much to seed it (within some limit).
I think those are both very useful observations! :) I think flushing credit when a goroutine blocks is a good idea to try. I also think the assist queue should probably be satisfied in a first-come first-serve order. The fairness is likely better for progress, because if you're doing a lot of partial fills in a row, you end up letting nothing run for quite a while, which seems pathological but also very bad.
Ack. And yeah, with plenty of runway the GC really shouldn't be assisting at all in this case. I fully agree with you there.
Oops, sorry, you're right. It does lead to more frequent GC cycles after you're already starting earlier than the 'sweet spot' (where the background workers end right as the application reaches the heap goal). This is because there's a floor on the GC CPU usage at 25%. If we could scale GC CPU usage down further, we could start earlier and end at the same point. I'm not really sure what I was thinking when I wrote that. Perhaps my head was still back in the pre-1.18 days, where this was more true because of the 5% assist (30% total) target. 😅
It would still be good to "prove" that the pacer isn't the source of extra assists at this point, since this is mostly a hunch. Where I'm coming from is that we're already starting early enough to see a slight increase in GC frequency in some cases, which suggests to me that we really are pushing past the sweet spot. But maybe we still haven't quite gotten there. We could probably rule it out by being even more conservative about the cons/mark while experimenting with assist scheduling, to keep us honest.
Oof. That's what I was concerned about. :( Well, that does support my theory a little bit. Being on the assist queue frequently and not finding work to do is definitely a point at which something is breaking down, because the goroutines allocating think they need to help, but there isn't even enough work for them to do. Two things I'm wondering about: |
Great! That makes sense. We have another suite for go-gettable microbenchmarks; those are much easier to add. If you can point us at the most useful of them, they should be super easy to add.
Interesting. Is it not sufficient to just set |
@mknyszek I've been thinking more about this, and am wondering if a change like the following might make sense, at least in spirit: diff --git a/src/runtime/mgcpacer.go b/src/runtime/mgcpacer.go
index 32e19f96e1..03bfc41c6c 100644
--- a/src/runtime/mgcpacer.go
+++ b/src/runtime/mgcpacer.go
@@ -502,6 +502,14 @@ func (c *gcControllerState) revise() {
// heapGoal assuming the heap is in steady-state.
heapGoal := int64(c.heapGoal())
+ // If we have a soft memory limit configured, relax the goal (raise it) as
+ // it applies to GC assist to accommodate for the fact that the application
+ // is tolerant of the heap growing to that larger limit. As a result, we
+ // don't need to pace allocations using assist as aggressively.
+ if c.memoryLimit.Load() != maxInt64 {
+ heapGoal = max(heapGoal, int64(c.memoryLimitHeapGoal()))
+ }
+
// The expected scan work is computed as the amount of bytes scanned last
// GC cycle (both heap and stack), plus our estimate of globals work for this cycle.
scanWorkExpected := int64(c.lastHeapScan + c.lastStackScan.Load() + c.globalsScan.Load()) Currently, GC assist is paced to prevent a heap from exceeding the current GC's heap goal, regardless of whether the GC was triggered and that goal defined due to the GC percent or due to the soft memory limit. We can split these two possible heap goals into a For the purposes of GC assist and in cases where memory limits are configured, would it make sense to only consider the The effect of this kind of change would be that when running a "gc percent" GC, we would allow the heap to grow past the The rationale behind this kind of change is that when memory limits are set, the application has already expressed a maximum size that it can tolerate the heap growing to. With this limit defined and available to the Go runtime, pacing allocations when the heap is well below that limit and not on pace to reach that limit before a given GC cycle completes feels artificial. Pacing should only be necessary to preserve the memory limit and avoid OOMs. |
@nvanbenschoten Sorry for the delay! I think that's a really interesting idea and it makes a lot of sense to me. If you have the protection of I think there might need to be some experimentation and tests to make sure that everything still behaves appropriately, but I don't see why it wouldn't. As for the specific implementation details, I am not really a fan of The easiest "smooth way" of doing this that comes to mind is to always pace assists by the memory limit heap goal. I'd be a little worried about pathological cases of runaway allocation causing less predictable memory usage for applications only using I can't help but feel that there is some simple, smooth way of expressing this relationship. That is, some function that prefers the memory limit goal the closer it gets to the |
This is an extraction of a private thread with @mknyszek.
While working with a customer that is running CockroachDB, we found a class of workloads that are severely (and reproducibly) impacted by GC assist. The situation occurs when an app opens many (O(10k)) SQL connections to CockroachDB and applies a moderately memory-intensive workload (O(1KB) reads and writes @ 30k qps). We've found that this leads to severe tail latency blips (p50=2ms, p99=70ms) and have pinpointed the effect to GC assist.
This effect is present with
go1.17.11
andgo1.19.1
. It is also present onmaster
with 4179552. However, the degraded tail latency disappears if GC assist is disabled by commenting out this line.Increasing GOGC does improve tail latency. However, the improvement comes from running fewer GCs. When GC was running, the impact on tail latency appears to be about the same.
Go execution traces show GC assist kick in across workload goroutines almost immediately (within a few ms) after the background GC process starts. It then consumes the majority of on-cpu time on these goroutines for the duration of the background GC duration.
Here is a collection of gctraces from different variants of the test using
GODEBUG=gctrace=1,gcpacertrace=1
:go1.17_gogc_100.txt
go1.17_gogc_300.txt
go1.17_gogc_900.txt
go1.19_gogc_100.txt
go1.19_gogc_300.txt
go1.19_gogc_900.txt
An interesting note is that the investigations began when we noticed higher tail latency when the same workload (30k qps) was split across more SQL connections. CockroachDB maintains two goroutines per active connection. An early test found the following:
To reproduce
The easiest way to reproduce is using CockroachDB's internal cluster orchestration tool called
roachprod
. With that tool, the reprudction steps are:If roachprod is not an option, than the steps are:
c2-standard-30
VM instancesI'm happy to help get these reproduction steps working in other environments.
The text was updated successfully, but these errors were encountered: