-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Pipelines passed to from_pipe() are converted to float32 unless torch_dtype is specified, leading to higher memory usage and slower inference.
Reproduction
import torch
from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline
pipe = StableDiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
print(f"Before: {pipe.dtype} - {torch.cuda.memory_allocated() // 1048576} MB")
i2i = StableDiffusionImg2ImgPipeline.from_pipe(pipe)
print(f"After: {pipe.dtype} - {torch.cuda.memory_allocated() // 1048576} MB")Logs
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s]`torch_dtype` is deprecated! Use `dtype` instead!
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:08<00:00, 1.17s/it]
Before: torch.float16 - 2637 MB
After: torch.float32 - 5258 MB
System Info
- 🤗 Diffusers version: 0.35.2
- Platform: Windows-10-10.0.19045-SP0
- Running on Google Colab?: No
- Python version: 3.10.11
- PyTorch version (GPU?): 2.9.1+cu126 (True)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Huggingface_hub version: 0.36.0
- Transformers version: 4.57.3
- Accelerate version: 1.12.0
- PEFT version: not installed
- Bitsandbytes version: not installed
- Safetensors version: 0.7.0
- xFormers version: not installed
- Accelerator: NVIDIA GeForce GTX 1080, 8192 MiB
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working