Skip to content

Conversation

@amrmahdi
Copy link
Contributor

@amrmahdi amrmahdi commented Jan 11, 2026

Image export and cache export currently run sequentially. This adds unnecessary latency when both are configured, especially when exporting to different destinations.

Tested with vLLM CI builds. A full build-from-cache takes about 7 minutes, with export dominating: image export runs for ~2min, then cache export for another ~2min. Running them in parallel cuts export time in half, saving ~2 minutes per build (~30% of total time).

The one exception is inline cache: it works by running the cache export first to generate metadata, which is then embedded into the image config before export. This means image export cannot start until inline cache completes, so we preserve sequential execution in that case.

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot one other case that affects this. In ExportTo based on Mode, when cache export checks if result is to be exported, it checks if it already exists or not. Eg. if image exporter(or previous cache source) already generated the layer then it is exported in CacheExportModeMin but otherwise only metadata is exported and no new layer tarball is created by the exporter. I wonder that maybe we need to instead break the export phase into two passes so that the push can run in parallel, but creating the objects is still sequential.

@tonistiigi
Copy link
Member

Windows CI doesn't seem to like this as well @rzlink

failed to commit ngtnv4s8gc8tp0beynhncp87s to oexb7oycg3u2l5y777xguo7u6 during finalize: failed to reimport snapshot: hcsshim::ActivateLayer failed in Win32: The process cannot access the file because it is being used by another process. (0x20)

@amrmahdi
Copy link
Contributor Author

I forgot one other case that affects this. In ExportTo based on Mode, when cache export checks if result is to be exported, it checks if it already exists or not. Eg. if image exporter(or previous cache source) already generated the layer then it is exported in CacheExportModeMin but otherwise only metadata is exported and no new layer tarball is created by the exporter. I wonder that maybe we need to instead break the export phase into two passes so that the push can run in parallel, but creating the objects is still sequential.

@tonistiigi would a simplification be to treat CacheExportModeMin as sequential like the inline cache?

@amrmahdi
Copy link
Contributor Author

I wonder that maybe we need to instead break the export phase into two passes so that the push can run in parallel, but creating the objects is still sequential.

@tonistiigi If we go with the 2-phase approach, we will need to change the interfaces to add the 2 export phases (maybe Prepare and Finalize), and we could keep Export as is for backward compatibility (internally it just calls Prepare + Finalize). If I understand correctly, Finalize will mostly be a no-op for most exporters. Does this align with your thoughts?

@tonistiigi
Copy link
Member

would a simplification be to treat CacheExportModeMin as sequential like the inline cache?

I think it could still be different even with max. The blobs creation depends on how the compression algorithm and level was configured for exporter if the blobs are shared.

and we could keep Export as is for backward compatibility

In buildkit repo we don't care about backward compatibility of Go API. Only on-wire APIs. Either splitting in two or maybe Export can optionally return a "finalize" callback that needs to be called (but can be called in parallel with other tasks).

@amrmahdi
Copy link
Contributor Author

I wonder that maybe we need to instead break the export phase into two passes so that the push can run in parallel, but creating the objects is still sequential.

@tonistiigi i took a stab at it, let me know what you think

@amrmahdi amrmahdi requested a review from tonistiigi January 14, 2026 21:26
Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach is ok, check the comments and CI.

This doesn't need to be in the first PR, but if we add Finalize to exporter then all the exporters should use it, not just the image exporter.

@amrmahdi amrmahdi force-pushed the amrmahdi/parallel-export branch from a4a707c to caf2afb Compare January 16, 2026 06:15
@amrmahdi
Copy link
Contributor Author

I think the approach is ok, check the comments and CI.

This doesn't need to be in the first PR, but if we add Finalize to exporter then all the exporters should use it, not just the image exporter.

Sure, I can addressed the other exporters in a follow up PR

@amrmahdi amrmahdi force-pushed the amrmahdi/parallel-export branch 2 times, most recently from e8844d2 to 4be8a00 Compare January 16, 2026 06:36
Image export and cache export currently run sequentially. This adds
unnecessary latency when both are configured, especially when exporting
to different destinations.

Tested with vLLM CI builds. A full build-from-cache takes about 7 minutes,
with export dominating: image export runs for ~2min, then cache export for
another ~2min. Running them in parallel cuts export time in half, saving
~2 minutes per build (~30% of total time).

The one exception is inline cache: it works by running the cache export
first to generate metadata, which is then embedded into the image config
before export. This means image export cannot start until inline cache
completes, so we preserve sequential execution in that case.

Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
Split export into two phases: artifact creation and push. Export now
returns a FinalizeFunc that performs the registry push. Both image
and cache export finalize callbacks run concurrently via errgroup,
allowing image push and cache push to overlap.

Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
Transfer lease ownership to descref when push is deferred to finalize.

Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
@amrmahdi amrmahdi force-pushed the amrmahdi/parallel-export branch 5 times, most recently from 65907a1 to 8eeaa5c Compare January 16, 2026 08:18
The descriptor reference (descref) is used by the solver to record
the build result in the history. For store-only exports (no push),
we were returning nil for descref, which meant the descriptor was
not recorded in build history. This caused content to be GC'd
prematurely when the containerd image was deleted.

Fix by always creating descref regardless of whether there's a push.
The done function will be a no-op when the solver's lease is active,
so there's no change in lease behavior.

Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
@amrmahdi amrmahdi force-pushed the amrmahdi/parallel-export branch from 8eeaa5c to 59c5372 Compare January 16, 2026 08:47
@amrmahdi amrmahdi changed the title solver: add option to run image and cache exports in parallel solver: run image and cache exports in parallel Jan 16, 2026
@amrmahdi amrmahdi requested a review from tonistiigi January 16, 2026 16:31
@amrmahdi
Copy link
Contributor Author

@tonistiigi any chance this can make it to the v0.27.0 release ? 😇

Copy link
Member

@tonistiigi tonistiigi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance this can make it to the v0.27.0 release ?

Too late for that.

}

cacheExporterResponse, err := runCacheExporters(ctx, cacheExporters, j, cached, inp)
cacheFinalizers, err := prepareCacheExporters(ctx, cacheExporters, j, cached, inp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for splitting the cache exporters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that cache ExportTo can see the layers image export created and skip recreating them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi any other concerns on the PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand why part of the image export needs to run before. But why did you need to split the cache export into two passes as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to keep a consistent two-phase pattern across all exporters, but you're right that cache export doesn't need to be split. This keeps the changes more simple.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi can I get a final review?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tonistiigi gentle ping on this please. Thanks!

@amrmahdi amrmahdi requested a review from tonistiigi January 21, 2026 07:13
- Remove NoOpFinalize, use nil instead for exporters without finalize
- Simplify Export documentation per review feedback
- Add nil check when calling finalize callbacks

Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
@amrmahdi amrmahdi force-pushed the amrmahdi/parallel-export branch 3 times, most recently from 8026d2d to c4f6f9a Compare January 22, 2026 02:51
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
@amrmahdi amrmahdi force-pushed the amrmahdi/parallel-export branch from c4f6f9a to 156699f Compare January 22, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants