Skip to content

Remove blocking call from timestep embedding of Chroma#8255

Merged
comfyanonymous merged 1 commit intoComfy-Org:masterfrom
drhead:patch-4
May 23, 2025
Merged

Remove blocking call from timestep embedding of Chroma#8255
comfyanonymous merged 1 commit intoComfy-Org:masterfrom
drhead:patch-4

Conversation

@drhead
Copy link
Contributor

@drhead drhead commented May 23, 2025

This creates a tensor on device to avoid an unnecessary sync in the Chroma model forward pass. Part of a series of PRs I'm creating. Combined with #7152, another PR to make a CPU copy of sampler sigmas for use in control flow (since control flow needs these as python scalars to work), and optionally code to support computing latent previews/transferring them to CPU on a separate CUDA stream as I did here for webui forge (may be more complicated since comfyui doesn't seem to process previews in a separate thread from the main one), I am seeing a roughly 7% overall speed boost.

@drhead drhead requested a review from comfyanonymous as a code owner May 23, 2025 15:59
@comfyanonymous comfyanonymous merged commit 30b2eb8 into Comfy-Org:master May 23, 2025
@drhead drhead deleted the patch-4 branch May 23, 2025 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants