Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: Stream starvation in http2 priority write scheduler #58804

Closed
mborsz opened this issue Mar 1, 2023 · 18 comments
Closed

net/http: Stream starvation in http2 priority write scheduler #58804

mborsz opened this issue Mar 1, 2023 · 18 comments
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@mborsz
Copy link

mborsz commented Mar 1, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/usr/local/google/home/maciejborsz/.cache/go-build"
GOENV="/usr/local/google/home/maciejborsz/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/usr/local/google/home/maciejborsz/.gvm/pkgsets/go1.19/global/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/usr/local/google/home/maciejborsz/.gvm/pkgsets/go1.19/global"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/google/home/maciejborsz/sdk/go1.20"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/google/home/maciejborsz/sdk/go1.20/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/usr/local/google/home/maciejborsz/repro/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build187328419=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  1. I create a single http2 connection with multiple streams inside:
  • one long running, low volume stream that periodically receives tiny amount of data from the server
  • many short requests getting larger amount of data
  1. I logged when the long running goroutine gets packets from the server

https://gist.github.com/mborsz/c2335eff2cd164b434ee37489f23aa2b is a full code with output that I get on my machine.
The file output-go1.20.txt presents two problems:

  • longrunning data is not transferred at all until the last short request finishes
  • the first short requests (with numbers 0-5) finish only after more recent request finishes.

https://go.dev/play/p/5febB68XX0j is slightly modified version of repro code (changed totalRequest to 100 to fit in the timeout)

What did you expect to see?

  • Longrunning stream gets data all the time

What did you see instead?

  • While the short requests are happening, the traffic on longrunning stream is blocked completely. By increasing totalRequests I'm able to prolong the period of no data arbitrarily. With infinite stream of incoming requests, we see long running to be starved forever (can be reproduced by changing totalRequests to e.g. 50000000 and increasing response size in handlerGet to e.g. 10M).

Notes:

  • I think the observed behavior matches code of PriorityWriteScheduler for case when all weights are equal (as in our case):
  • Changing the WriteScheduler to "random" fixes the problem on my machine (uncommenting the code in snippet). On the playground it still fails. Not sure why, but I'm suspecting some playground limitation or the test is too short on the playground to trigger the problem.
  • After d872374 (golang 1.19+), default http server uses priority (affected) write scheduler.
@mborsz
Copy link
Author

mborsz commented Mar 1, 2023

A naive fix would be to use round robin over n.kids (i.e. start iteration where we have finished previously + make the list circular) in case of equal weights around https://github.com/golang/net/blob/1d46ed8b487a394ccf6d67de0a95b76ebc0b8aba/http2/writesched_priority.go#L183-L189

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 1, 2023
@bcmills bcmills added this to the Backlog milestone Mar 1, 2023
@bcmills
Copy link
Contributor

bcmills commented Mar 1, 2023

(attn @neild)

@neild neild self-assigned this Mar 1, 2023
@neild
Copy link
Contributor

neild commented Mar 1, 2023

Thanks for the nice analysis.

When the children of a node have varying weights, walkReadyInOrder sorts the children of a node by the ratio of weight to bytes sent. This prevents starvation--a low-priority stream will get a smaller fraction of the available resources, but it'll still get a fraction.

However, when the children all have the same weight (the common case), walkReadyInOrder just picks the first available stream without consideration for how many bytes have been sent, permitting starvation.

A very naive fix might be to drop the common-case optimization and always sort the list of children taking into account bytes sent. That might be expensive, however; perhaps round-robin over sibling nodes gets the same benefit with less cost.

Also, I think there's another issue with walkReadyInOrder, when handling varying weights: It sorts nodes by weight/bytes_sent, but doesn't take recency into account. A long-running stream that has sent many bytes will be starved by shorter-lived streams. Also, a stream that has sent no bytes is always prioritized over one that has sent any, so the condition you describe here can still occur: Rapidly-created, short-lived requests will starve a longer-lived request.

@wojtek-t
Copy link

wojtek-t commented Mar 2, 2023

/cc

@ptabor
Copy link

ptabor commented Mar 3, 2023

I wonder whether it was aware decision to use LIFO order in scope of a single priority (vs. FIFO, where served stream is moved to the end even if still active).
For reference: the grpc http2 implementation uses round robin over active streams using 16KB chunks

@mborsz
Copy link
Author

mborsz commented Mar 8, 2023

@neild I think migration of common scenario to round robin has a significant advantage that is relatively simple change. I'm happy to prepare a PR if we agree on this.
Can we backport such a fix to older golang version?

In longer perspective, I agree that we probably need to rewrite also the other case, where rapidly-created, short-lived requests can starve a longer-lived requests, but I'm afraid the larger complexity of the change may block us from backporting this.

@fuweid
Copy link
Contributor

fuweid commented Mar 9, 2023

/cc

Thanks for raising this issue! The grpc-go uses round-robin and each stream's latency can be predictable because of the max streams. The latency will be increased if the number of streams increases. But it works as expected.

I think migration of common scenario to round robin has a significant advantage that is relatively simple change. I'm happy to prepare a PR if we agree on this.

Maybe we can introduce new scheduler for round-robin since http2.Server allows user to modify the scheduler. And most cases don't tune the priority for the stream :)

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/476155 mentions this issue: http2: PriorityWriteScheduler use round robin for equal weight case.

@mborsz
Copy link
Author

mborsz commented Mar 14, 2023

Maybe we can introduce new scheduler for round-robin since http2.Server allows user to modify the scheduler. And most cases don't tune the priority for the stream :)

I think in either case we need fix misleading behavior of PriorityWriteScheduler for equal weights, so I'm leaning towards fixing that instead of adding a new one.

@neild
Copy link
Contributor

neild commented Mar 15, 2023

I think we should drop the priority write scheduler.

RFC 9113 (the most recent iteration of HTTP/2) deprecates the stream prioritization scheme:
https://www.rfc-editor.org/rfc/rfc9113.html#section-5.3

RFC 7540 defined a rich system for signaling priority of requests. However, this system proved to be complex, and it was not uniformly implemented.

The flexible scheme meant that it was possible for clients to express priorities in very different ways, with little consistency in the approaches that were adopted. For servers, implementing generic support for the scheme was complex. Implementation of priorities was uneven in both clients and servers. Many server deployments ignored client signals when prioritizing their handling of requests.

In short, the prioritization signaling in RFC 7540 [RFC7540] was not successful.

We have evidence that the priority write scheduler is CPU hungry:

Turns out, O(n) is actually the worst case. Some discussion in the thread below. There is some optimization opportunity to reduce the constants, but a faithful implementation cannot avoid the O(n) worst-case.
#18318 (comment)

FWIW, at we recently switched Tailscale from the priority scheduler to the random scheduler because the priority one used so much more CPU for its negligible (at least to us) benefits.
https://go-review.git.corp.google.com/c/net/+/382015/4#message-6ea63b0e7a2e6872192fc16ff3c352e8817eb0fa

There is evidence that the priority write scheduler does (or did) provide benefits in some circumstances:
https://docs.google.com/document/d/1oLhNg1skaWD4_DtaoCxdSRN5erEXrH-KnLrMwEpOtFY/edit

However:

  • That analysis is about six years old at this point, and I have low confidence that browser behavior has not changed in the intervening time, given the deprecation of the RFC 7540 priority scheme.
  • A 5% or better performance improvement was detected in only 25% of test cases.
  • The test was designed to measure the best-case for prioritization (see the methodology section), and so we should expect the real-world benefits to be smaller.
  • It is not clear to me that this test scenario well matches common usage of the Go HTTP server.
  • In these tests, simpler SPDY-style prioritization (which is what the QUICHE priority scheduler appears to implement) beat out RFC 7540 prioritization.

QUICHE does not appear to implement HTTP/2 priorities. It contains a priority write scheduler, but so far as I can tell this scheduler simply prefers higher-priority streams over lower ones (doing round-robin within a priority bucket) and it looks like all HTTP/2 streams are given the same priority.

In summary: The x/net/http2 priority write scheduler implements an obsolete prioritization scheme, is CPU expensive, and permits stream starvation in the common case.

We should switch the default back to the random write scheduler in the short term, and possibly add a round-robin scheduler. Long-term, there may be value in a SPDY-style scheduler as used by QUICHE.

@mborsz
Copy link
Author

mborsz commented Mar 16, 2023

The proposed short term plan makes sense to me. The random write scheduler in our tests definitely improves the situation and it's available now, well tested.

I think mid term we should also switch default to newly added round-robin scheduler, which should provide lower p99 frame delivery latency compared to random scheduler.

For our use case, we have been comparing random and priority and while on priority we observed infinite starvation, the random was still presenting quite high latency spikes (on etcd e2e tests we observed ~3m periods where single stream wasn't receiving any data due to traffic on other streams).

I was able to reproduce ~20s latency spikes using golang-level repro: https://gist.github.com/mborsz/8f4dc894a6fa8fea8adb19a899dc16f3

It's simply the same repro as above, but also measuring latency between write and read on "longrunning" stream.

Raw results:

Inflight = 1000, totalRequests = 20000, frameSize = 16 * 1024, random
2023/03/16 09:04:21 finished url https://127.0.0.1:35941/longrunning, took 2m22.749233857s, maxLatency=22.541549601s
2023/03/16 09:16:08 finished url https://127.0.0.1:45509/longrunning, took 2m22.914708939s, maxLatency=18.330607707s
2023/03/16 09:20:35 finished url https://127.0.0.1:34635/longrunning, took 2m22.420850383s, maxLatency=16.384647093s

Inflight = 1000, totalRequests = 20000, frameSize = 16 * 1024, round robin
2023/03/16 09:24:15 finished url https://127.0.0.1:40443/longrunning, took 2m43.572479808s, maxLatency=1.695812761s
2023/03/16 09:28:08 finished url https://127.0.0.1:43495/longrunning, took 2m42.647369718s, maxLatency=736.103012ms
2023/03/16 09:31:49 finished url https://127.0.0.1:37509/longrunning, took 2m50.088901237s, maxLatency=1.515095342s

My interpretation: The max latency is significantly higher on random than on round robin, which is kind of expected.
While on average random should pick each stream every "#streams", there is no mechanism that would prevent arbitrarily long periods without picking specific stream and based on naive experiments we see that it happens.

@ptabor
Copy link

ptabor commented Mar 16, 2023

Thank you Maciek, for the experiment. It's interesting that the tested implementation of round robin has ~11% tax on throughput. I assume it's consequence of running minimal change in PriorityWriteSchedulerConfig rather than cost of round-robin itself. Usage of linked lists is less cpu-cache-friendly than a dense array/circular buffer.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/478735 mentions this issue: http2: change default frame scheduler to round robin

@mborsz
Copy link
Author

mborsz commented Mar 23, 2023

I just tested proposal in https://go.dev/cl/478735 with same repro as in comment #58804 (comment)

2023/03/23 07:53:43 finished url https://127.0.0.1:45395/longrunning, took 2m21.819200793s, maxLatency=2.158699781s
2023/03/23 07:56:41 finished url https://127.0.0.1:37295/longrunning, took 2m26.319909959s, maxLatency=2.20333328s
2023/03/23 07:59:25 finished url https://127.0.0.1:46495/longrunning, took 2m25.444875987s, maxLatency=1.117314575s

Latency metric looks very reasonable compared and also the throughput seems to be similar to random scheduler.

gopherbot pushed a commit to golang/net that referenced this issue May 15, 2023
The priority scheduler allows stream starvation (see golang/go#58804)
and is CPU intensive. In addition, the RFC 7540 prioritization
scheme it implements was deprecated in RFC 9113 and does not appear
to have ever had significant adoption.

Add a simple round-robin scheduler and enable it by default.

For golang/go#58804

Change-Id: I5c5143aa9bc339fc0894f70d773fa7c0d7d87eef
Reviewed-on: https://go-review.googlesource.com/c/net/+/478735
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Damien Neil <dneil@google.com>
@serathius
Copy link

With golang/net@120fc90 merged I think we can consider issue fixed.

@aojea
Copy link
Contributor

aojea commented May 16, 2023

With golang/net@120fc90 merged I think we can consider issue fixed.

there is also the need to update the std lib

sc.writeSched = http2NewPriorityWriteScheduler(nil)

@mborsz
Copy link
Author

mborsz commented Jun 1, 2023

After 260a9b0 default in std lib has been changed to round robin.

sc.writeSched = http2newRoundRobinWriteScheduler()

@linxiulei
Copy link

With golang/net@120fc90 merged I think we can consider issue fixed.

there is also the need to update the std lib

sc.writeSched = http2NewPriorityWriteScheduler(nil)

I think we still miss it with #64216

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

10 participants