[docs] Parallel loading of shards#12135
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| pipeline = DiffusionPipeline.from_pretrained( | ||
| "Wan-AI/Wan2.2-I2V-A14B-Diffusers", | ||
| torch_dtype=torch.bfloat16, | ||
| device_map="cuda" |
There was a problem hiding this comment.
Do we want to talk a bit about device_map? The motivation behind passing device_map essentially comes from this PR: #11904. I have also tried providing a bit of motivation behind adding it at the pipeline-level here: #12122 (comment).
@stevhliu it comes to fruition when we initialize the model directly on the accelerator device through |
Docs to go along with #12028.
I didn't mention #11904 since it doesn't appear to require any action on a user's behalf and it just works in the background. It's a cool design/implementation detail though and I think it'd be pretty interesting to maybe do a blog post about optimizations like this.