Skip to content

modular diffusers#3278

Open
sayakpaul wants to merge 9 commits intomainfrom
modular-diffusers
Open

modular diffusers#3278
sayakpaul wants to merge 9 commits intomainfrom
modular-diffusers

Conversation

@sayakpaul
Copy link
Member

@sayakpaul sayakpaul commented Feb 13, 2026

Adds the announcement post on Modular Diffusers!

Previewed the content on https://huggingface.co/new-blog and it looks great!

Copy link

@yiyixuxu yiyixuxu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! looking great!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog reads well! Quite a fan of modular diffusers in general, just a bit curious about the modular repositories that reference other repos rather than duplicating model weights.


When you call `ModularPipeline.from_pretrained`, it works with any existing Diffusers repo out of the box. But Modular Diffusers also introduces Modular Repositories.

A modular repository doesn't duplicate any model weights. Instead, it references components directly from their original model repos. For example, [diffusers/flux2-bnb-4bit-modular](https://huggingface.co/diffusers/flux2-bnb-4bit-modular) contains no model weights at all — it loads a quantized transformer from one repo and the remaining components from another.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect this to be a big feature/usage for modular diffusers?

A few thoughts here:

  • This is potentially a security issue if used incorrectly:
    • reference a repo (optionally with a revision set to force it)
    • referenced repo gets updated with potential malware
    • original repo didn't change, setting a revision did not protect the users either
  • I feel like model repos should contain model weights as well, they're not really code repos. Given XET, would it make sense to just duplicate the weights in the repos anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With XET I think we'll generally really push for weight duplication (which is nearly free) and for each repo to be isolated. We've been bitten a bit with this in transformers and it has led to a flurry of issues

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With XET I think we'll generally really push for weight duplication (which is nearly free)

Do you mean free in terms of Hub storage cost?

Our main concern/motivation is that duplicated weights cause a lot of inefficiency on the user side — which is more of an issue for the diffusers use case since pipelines are composed of many shared frozen components (text encoders, VAEs, etc.) that are identical across pipelines. See this discussion for context: huggingface/diffusers#10413

Other than the wasted disk space and bandwidth, the main thing we want to solve is giving developers this piece of information to do things more efficiently. For example, if you're switching from pipeline A to pipeline B and they share most components except one (which is very typical in diffusers), if we provide a way to programmatically flag the duplication, users can choose to only unload/load the delta. My knowledge on the Hub side is not the most up to date though — is XET already doing something to help on the user side as well?

Let me know if you have any suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO things have changed for the better since the issue you linked.

With XET, you get audo-deduplication of "xorbs" (you can see this as atomic storage units). If you download one big file (made of xorbs) from a repo, keep it locally, and download another big file (made of xorbs too) from another repo, XET will deduplicate the xorbs directly.

If the two files are the same (made up of the same xorbs), then the download should be instantaneous.

Some caveats:

  • This depends on the size of the XET cache -> it needs to be sufficiently large for it to contain the xorbs
  • In practice, the XET cache is often disabled as it can slow down some cases. To active it, you need to set HF_XET_CHUNK_CACHE_SIZE_BYTES
  • You can read more here

IMO, it would make a lot more sense to build what we're trying to build here on top of XET rather than what we have; it's building directly on top of one of the Hub's main technology, it will continue to be optimized as time goes on, and it stays true to the model repos' main idea which is that it should store weights.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with having repos like https://huggingface.co/diffusers/flux2-bnb-4bit-modular is that the Hub is really not build for such repos, and nothing will work properly: there will be no download count, no reference to the linked files, no viz on the number of parameters/the safetensors making it up, etc.

The Hub's model repos are really made for repos that host model weights.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question though about the duplicated weights @LysandreJik. If downloads are being counted based on downloads to the config, what are we gaining on the Hub side by having the weights hosted in the modular repo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it a bit more, I think pointers to other repos is a bit confusing if weights exist in a repo already 🤔 and the point about security issues is very valid; code that just starts downloading other code because of cascading permissions is a real risk.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the model weights being referred to come through a gated repository? Would we run into any legal concerns?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the model weights being referred to come through a gated repository? Would we run into any legal concerns?

the weights are downloaded directly from the original gated repo, so no issue there - duplicating weights without having the same license would have the legal issue but it is not any different from what's expected from users currently

Copy link

@yiyixuxu yiyixuxu Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DN6

code that just starts downloading other code because of cascading permissions is a real risk.

I think we should not pass trust_remote_code to load() here unless it is a local folder
https://github.com/huggingface/diffusers/blob/main/src/diffusers/modular_pipelines/modular_pipeline.py#L2230

we could store the pipeline repo inside its __init__

self._pretrained_model_name_or_path = pretrained_model_name_or_path

and compare it against each component's pretrained_model_name_or_path

also thinking maybe we could modifu modular_model_index to use "." to indicate local folder

{
  "_blocks_class_name": "Flux2AutoBlocks",
  "_class_name": "Flux2ModularPipeline",
  "_diffusers_version": "0.37.0.dev0",

  "transformer": [
    "diffusers",
    "Flux2Transformer2DModel",
    {
      "pretrained_model_name_or_path": ".",
      "revision": null,
      "subfolder": "transformer",
      "type_hint": [
        "diffusers",
        "Flux2Transformer2DModel"
      ],
      "variant": null
    }
  ],

}

let me know what you think - I can open a new PR to quickly address this

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super powerful work!

The content is very detailed, but I think we may be jumping into details too early. It could be helpful to have a short motivation section first: here's how we used to solve this problem, and here's how we can do it much better with modular diffusers now.


The community has already started building complete pipelines with Modular Diffusers and publishing them on the Hub, with model weights and ready-to-run code.

- [**Krea Realtime Video**](https://huggingface.co/krea/krea-realtime-video) — A 14B parameter real-time video generation model distilled from Wan 2.1, achieving 11fps on a single B200 GPU. It supports text-to-video, video-to-video, and streaming video-to-video — all built as modular blocks. Users can modify prompts mid-generation, restyle videos on-the-fly, and see first frames within 1 second.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to the json, or explain that this reuses the same components as Wan-AI/Wan2.1-T2V-14B-Diffusers but replaces the transformer with a different version ?

Copy link
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

i think we can make this post even stronger by not burying what makes it really shine and instead lead with it

Copy link
Contributor

@ariG23498 ariG23498 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be so cool! 😎

sayakpaul and others added 4 commits February 20, 2026 11:01
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: apolinário <joaopaulo.passos@gmail.com>
Co-authored-by: Linoy Tsaban <57615435+linoytsaban@users.noreply.github.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: YiYi Xu <yixu310@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants