gateway: fix exec process lifecycle ordering#6531
Closed
tonistiigi wants to merge 2 commits intomoby:masterfrom
Closed
gateway: fix exec process lifecycle ordering#6531tonistiigi wants to merge 2 commits intomoby:masterfrom
tonistiigi wants to merge 2 commits intomoby:masterfrom
Conversation
Send Started before any async Exit/Done paths to preserve protocol order. Close all tracked processIO pipe endpoints during Close so pio.done can always drain and avoid hangs in gateway exec teardown. Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
processIO.Close() was removing processWriters too early That could make us send Done before stdout/stderr EOF messages This change keeps processWriters tracked until EOF is sent, so message order is correct: Started -> output/EOF -> Done Signed-off-by: CrazyMax <1951866+crazy-max@users.noreply.github.com>
c8d6e53 to
2affe68
Compare
crazy-max
reviewed
Feb 24, 2026
frontend/gateway/gateway.go
Outdated
| } | ||
| } | ||
| for fd, w := range pio.processWriters { | ||
| delete(pio.processWriters, fd) |
Member
There was a problem hiding this comment.
https://github.com/moby/buildkit/actions/runs/22342178945/job/64648494627#step:8:1223
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- NewContainer 5csf58g7gk7jgcoi687dpr6k2" spanID=3af04cd4fa4357d3 traceID=5f061102d7b4ceebbd34d62ee276bcb5
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- Init Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="Starting new container for 5csf58g7gk7jgcoi687dpr6k2 with args: [\"sh\"]"
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="returning network namespace it86wk5ub34v0a8u7gefehkyd from pool"
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Started Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, 2 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- Init Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Started Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="Execing into container 5csf58g7gk7jgcoi687dpr6k2 with args: [\"cat\" \"/data\"]"
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, fd=1, 26 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Exit Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, code=0, error=%!s(<nil>)" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Done Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, fd=1, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=0, 7 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, 8 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Exit Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, code=0, error=%!s(<nil>)" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Done Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=2, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- ReleaseContainer 5csf58g7gk7jgcoi687dpr6k2" spanID=8b5a1eae515c69eb traceID=865855f6e5868780bffbcc583d63ed69
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=0, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="returning network namespace it86wk5ub34v0a8u7gefehkyd from pool" span="sh -c echo cyo5arhgx70su9rgnxqtv28vb > /data && echo cyo5arhgx70su9rgnxqtv28vb > /rw/data && fail" spanID=99e8311a5093af6f traceID=0cfd195b1cbac41cc54c8addd9374e68
Seems related to Done being emitted before all per-fd EOF messages for a process from logs above. Next Init arrives, then hang.
Pushed extra commit so writers are closed but left tracked until the output goroutines send EOF and remove them.
This comment was marked as spam.
This comment was marked as spam.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Send Started before any async Exit/Done paths to preserve protocol order. Close all tracked processIO pipe endpoints during Close so pio.done can always drain and avoid hangs in gateway exec teardown.
Hope this fixes some flakiness/hangs we sometimes see in exec tests in CI.