-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: Stream starvation in http2 priority write scheduler #58804
Comments
A naive fix would be to use round robin over |
(attn @neild) |
Thanks for the nice analysis. When the children of a node have varying weights, However, when the children all have the same weight (the common case), A very naive fix might be to drop the common-case optimization and always sort the list of children taking into account bytes sent. That might be expensive, however; perhaps round-robin over sibling nodes gets the same benefit with less cost. Also, I think there's another issue with |
/cc |
I wonder whether it was aware decision to use LIFO order in scope of a single priority (vs. FIFO, where served stream is moved to the end even if still active). |
@neild I think migration of common scenario to round robin has a significant advantage that is relatively simple change. I'm happy to prepare a PR if we agree on this. In longer perspective, I agree that we probably need to rewrite also the other case, where rapidly-created, short-lived requests can starve a longer-lived requests, but I'm afraid the larger complexity of the change may block us from backporting this. |
/cc Thanks for raising this issue! The grpc-go uses round-robin and each stream's latency can be predictable because of the max streams. The latency will be increased if the number of streams increases. But it works as expected.
Maybe we can introduce new scheduler for round-robin since http2.Server allows user to modify the scheduler. And most cases don't tune the priority for the stream :) |
Change https://go.dev/cl/476155 mentions this issue: |
I think in either case we need fix misleading behavior of PriorityWriteScheduler for equal weights, so I'm leaning towards fixing that instead of adding a new one. |
I think we should drop the priority write scheduler. RFC 9113 (the most recent iteration of HTTP/2) deprecates the stream prioritization scheme:
We have evidence that the priority write scheduler is CPU hungry:
There is evidence that the priority write scheduler does (or did) provide benefits in some circumstances: However:
QUICHE does not appear to implement HTTP/2 priorities. It contains a priority write scheduler, but so far as I can tell this scheduler simply prefers higher-priority streams over lower ones (doing round-robin within a priority bucket) and it looks like all HTTP/2 streams are given the same priority. In summary: The We should switch the default back to the random write scheduler in the short term, and possibly add a round-robin scheduler. Long-term, there may be value in a SPDY-style scheduler as used by QUICHE. |
The proposed short term plan makes sense to me. The random write scheduler in our tests definitely improves the situation and it's available now, well tested. I think mid term we should also switch default to newly added round-robin scheduler, which should provide lower p99 frame delivery latency compared to random scheduler. For our use case, we have been comparing random and priority and while on priority we observed infinite starvation, the random was still presenting quite high latency spikes (on etcd e2e tests we observed ~3m periods where single stream wasn't receiving any data due to traffic on other streams). I was able to reproduce ~20s latency spikes using golang-level repro: https://gist.github.com/mborsz/8f4dc894a6fa8fea8adb19a899dc16f3 It's simply the same repro as above, but also measuring latency between write and read on "longrunning" stream. Raw results:
My interpretation: The max latency is significantly higher on random than on round robin, which is kind of expected. |
Thank you Maciek, for the experiment. It's interesting that the tested implementation of round robin has ~11% tax on throughput. I assume it's consequence of running minimal change in PriorityWriteSchedulerConfig rather than cost of round-robin itself. Usage of linked lists is less cpu-cache-friendly than a dense array/circular buffer. |
Change https://go.dev/cl/478735 mentions this issue: |
I just tested proposal in https://go.dev/cl/478735 with same repro as in comment #58804 (comment)
Latency metric looks very reasonable compared and also the throughput seems to be similar to random scheduler. |
The priority scheduler allows stream starvation (see golang/go#58804) and is CPU intensive. In addition, the RFC 7540 prioritization scheme it implements was deprecated in RFC 9113 and does not appear to have ever had significant adoption. Add a simple round-robin scheduler and enable it by default. For golang/go#58804 Change-Id: I5c5143aa9bc339fc0894f70d773fa7c0d7d87eef Reviewed-on: https://go-review.googlesource.com/c/net/+/478735 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Bryan Mills <bcmills@google.com> Run-TryBot: Damien Neil <dneil@google.com>
With golang/net@120fc90 merged I think we can consider issue fixed. |
there is also the need to update the std lib Line 4256 in 04e2472
|
I think we still miss it with #64216 |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
https://gist.github.com/mborsz/c2335eff2cd164b434ee37489f23aa2b is a full code with output that I get on my machine.
The file output-go1.20.txt presents two problems:
https://go.dev/play/p/5febB68XX0j is slightly modified version of repro code (changed totalRequest to 100 to fit in the timeout)
What did you expect to see?
What did you see instead?
Notes:
The text was updated successfully, but these errors were encountered: