Skip to content

Comments

gateway: fix exec process lifecycle ordering#6531

Closed
tonistiigi wants to merge 2 commits intomoby:masterfrom
tonistiigi:exec-order-fix
Closed

gateway: fix exec process lifecycle ordering#6531
tonistiigi wants to merge 2 commits intomoby:masterfrom
tonistiigi:exec-order-fix

Conversation

@tonistiigi
Copy link
Member

Send Started before any async Exit/Done paths to preserve protocol order. Close all tracked processIO pipe endpoints during Close so pio.done can always drain and avoid hangs in gateway exec teardown.

Hope this fixes some flakiness/hangs we sometimes see in exec tests in CI.

Send Started before any async Exit/Done paths to preserve protocol order.
Close all tracked processIO pipe endpoints during Close so pio.done can
always drain and avoid hangs in gateway exec teardown.

Signed-off-by: Tonis Tiigi <tonistiigi@gmail.com>
processIO.Close() was removing processWriters too early
That could make us send Done before stdout/stderr EOF messages

This change keeps processWriters tracked until EOF is sent, so message order is correct:
Started -> output/EOF -> Done

Signed-off-by: CrazyMax <1951866+crazy-max@users.noreply.github.com>
}
}
for fd, w := range pio.processWriters {
delete(pio.processWriters, fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/moby/buildkit/actions/runs/22342178945/job/64648494627#step:8:1223

    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- NewContainer 5csf58g7gk7jgcoi687dpr6k2" spanID=3af04cd4fa4357d3 traceID=5f061102d7b4ceebbd34d62ee276bcb5
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- Init Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="Starting new container for 5csf58g7gk7jgcoi687dpr6k2 with args: [\"sh\"]"
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="returning network namespace it86wk5ub34v0a8u7gefehkyd from pool"
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Started Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, 2 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- Init Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Started Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="Execing into container 5csf58g7gk7jgcoi687dpr6k2 with args: [\"cat\" \"/data\"]"
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, fd=1, 26 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Exit Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, code=0, error=%!s(<nil>)" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Done Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:j1n4pirkltsmzt7frqmipuoo6, fd=1, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=0, 7 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, 8 bytes" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Exit Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, code=0, error=%!s(<nil>)" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> Done Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=2, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|---> File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=1, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- ReleaseContainer 5csf58g7gk7jgcoi687dpr6k2" spanID=8b5a1eae515c69eb traceID=865855f6e5868780bffbcc583d63ed69
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="|<--- File Message 5csf58g7gk7jgcoi687dpr6k2:nfngt1u9833tq4yk4fuen8xhd, fd=0, EOF" spanID=67be47f640a62501 traceID=67bc755b8f76225c7044896b1054f0fb
    sandbox.go:205: time="2026-02-24T08:22:44Z" level=debug msg="returning network namespace it86wk5ub34v0a8u7gefehkyd from pool" span="sh -c echo cyo5arhgx70su9rgnxqtv28vb > /data && echo cyo5arhgx70su9rgnxqtv28vb > /rw/data && fail" spanID=99e8311a5093af6f traceID=0cfd195b1cbac41cc54c8addd9374e68

Seems related to Done being emitted before all per-fd EOF messages for a process from logs above. Next Init arrives, then hang.

Pushed extra commit so writers are closed but left tracked until the output goroutines send EOF and remove them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still deadlock

@tonistiigi tonistiigi closed this Feb 24, 2026
@Inonameraja

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants