-
Notifications
You must be signed in to change notification settings - Fork 1.4k
solver: run image and cache exports in parallel #6451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
9868aa9 to
acf5409
Compare
8e544cd to
acf5409
Compare
tonistiigi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot one other case that affects this. In ExportTo based on Mode, when cache export checks if result is to be exported, it checks if it already exists or not. Eg. if image exporter(or previous cache source) already generated the layer then it is exported in CacheExportModeMin but otherwise only metadata is exported and no new layer tarball is created by the exporter. I wonder that maybe we need to instead break the export phase into two passes so that the push can run in parallel, but creating the objects is still sequential.
|
Windows CI doesn't seem to like this as well @rzlink |
@tonistiigi would a simplification be to treat |
@tonistiigi If we go with the 2-phase approach, we will need to change the interfaces to add the 2 export phases (maybe |
I think it could still be different even with
In buildkit repo we don't care about backward compatibility of Go API. Only on-wire APIs. Either splitting in two or maybe |
@tonistiigi i took a stab at it, let me know what you think |
tonistiigi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the approach is ok, check the comments and CI.
This doesn't need to be in the first PR, but if we add Finalize to exporter then all the exporters should use it, not just the image exporter.
a4a707c to
caf2afb
Compare
Sure, I can addressed the other exporters in a follow up PR |
e8844d2 to
4be8a00
Compare
Image export and cache export currently run sequentially. This adds unnecessary latency when both are configured, especially when exporting to different destinations. Tested with vLLM CI builds. A full build-from-cache takes about 7 minutes, with export dominating: image export runs for ~2min, then cache export for another ~2min. Running them in parallel cuts export time in half, saving ~2 minutes per build (~30% of total time). The one exception is inline cache: it works by running the cache export first to generate metadata, which is then embedded into the image config before export. This means image export cannot start until inline cache completes, so we preserve sequential execution in that case. Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
Split export into two phases: artifact creation and push. Export now returns a FinalizeFunc that performs the registry push. Both image and cache export finalize callbacks run concurrently via errgroup, allowing image push and cache push to overlap. Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
Transfer lease ownership to descref when push is deferred to finalize. Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
65907a1 to
8eeaa5c
Compare
The descriptor reference (descref) is used by the solver to record the build result in the history. For store-only exports (no push), we were returning nil for descref, which meant the descriptor was not recorded in build history. This caused content to be GC'd prematurely when the containerd image was deleted. Fix by always creating descref regardless of whether there's a push. The done function will be a no-op when the solver's lease is active, so there's no change in lease behavior. Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
8eeaa5c to
59c5372
Compare
|
@tonistiigi any chance this can make it to the v0.27.0 release ? 😇 |
tonistiigi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance this can make it to the v0.27.0 release ?
Too late for that.
solver/llbsolver/solver.go
Outdated
| } | ||
|
|
||
| cacheExporterResponse, err := runCacheExporters(ctx, cacheExporters, j, cached, inp) | ||
| cacheFinalizers, err := prepareCacheExporters(ctx, cacheExporters, j, cached, inp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason for splitting the cache exporters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that cache ExportTo can see the layers image export created and skip recreating them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonistiigi any other concerns on the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand why part of the image export needs to run before. But why did you need to split the cache export into two passes as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to keep a consistent two-phase pattern across all exporters, but you're right that cache export doesn't need to be split. This keeps the changes more simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonistiigi can I get a final review?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tonistiigi gentle ping on this please. Thanks!
- Remove NoOpFinalize, use nil instead for exporters without finalize - Simplify Export documentation per review feedback - Add nil check when calling finalize callbacks Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
8026d2d to
c4f6f9a
Compare
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
c4f6f9a to
156699f
Compare
Image export and cache export currently run sequentially. This adds unnecessary latency when both are configured, especially when exporting to different destinations.
Tested with vLLM CI builds. A full build-from-cache takes about 7 minutes, with export dominating: image export runs for ~2min, then cache export for another ~2min. Running them in parallel cuts export time in half, saving ~2 minutes per build (~30% of total time).
The one exception is inline cache: it works by running the cache export first to generate metadata, which is then embedded into the image config before export. This means image export cannot start until inline cache completes, so we preserve sequential execution in that case.