Improve load_ip_adapter RAM Usage #10948

CyberVy · 2025-03-03T16:39:04Z

Loading a model with torch_dtype is None using transformers.modeling_utils.PreTrainedModel will result in additional RAM usage because of data type conversion.
There is no conversion if the data type of the model weights is the same as torch_dtype.
And different conversions have different impacts on RAM usage which is mentioned in #10679.
When torch_dtype is None, PreTrainedModel will regard it as torch.float32, which causes a lot more RAM usage.

This PR is to improve this issue.

@asomoza

asomoza · 2025-03-03T18:35:59Z

nice! do you measure the ram savings?

HuggingFaceDocBuilderDev · 2025-03-03T18:42:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

hlky · 2025-03-03T20:32:46Z

@bot /style

github-actions · 2025-03-03T20:33:36Z

Style fixes have been applied. View the workflow run here.

CyberVy · 2025-03-03T21:24:35Z

nice! do you measure the ram savings?

Yep! The peak CPU RAM usage decreases from 8.9G to 2.1G when I load an FP16 SDXL IP Adapter to GPU.

CyberVy · 2025-03-03T21:52:56Z

It's very easy to measure RAM savings in Colab.

Here are two minimal reproductions.

Now

from transformers import CLIPVisionModelWithProjection
import torch
model = CLIPVisionModelWithProjection.from_pretrained("eramth/ip-adapter",subfolder="sdxl_models/image_encoder",torch_dtype=torch.float16)
# The CPU RAM usage is about 2.3G.

model = model.to("cuda")
# The CPU RAM usage is still about 2.3G. And the VRAM usage is about 3.8G

# The below code can return the released CPU RAM to the system, so that we can observe the CPU RAM usage easily.
import ctypes
ctypes.CDLL("libc.so.6").malloc_trim(0)
# The CPU RAM usage is about 2.1G.

Before

from transformers import CLIPVisionModelWithProjection
import torch
model = CLIPVisionModelWithProjection.from_pretrained("eramth/ip-adapter",subfolder="sdxl_models/image_encoder")
# The CPU RAM usage is about 8.9G.

model = model.to("cuda",dtype=torch.float16)
# The CPU RAM usage is still about 6.4G. And the VRAM usage is about 3.8G

# The below code can return the released CPU RAM to the system, so that we can observe the CPU RAM usage easily.
import ctypes
ctypes.CDLL("libc.so.6").malloc_trim(0)
# The CPU RAM usage is about 2.2G.

Also you can get the same memory usage results directly loading an IP Adapter in diffusers.

asomoza · 2025-03-03T23:20:06Z

thanks a lot, I don't have a low RAM system so this is hard to catch for me at least. We have some errors in the tests which I believe aren't related to this PR. @hlky can you review this too please, there's a lot of failed tests so just to be sure.

hlky

Thanks @CyberVy. Nice improvement for low RAM users.

Failing tests are unrelated.

CyberVy added 7 commits March 3, 2025 20:34

Update ip_adapter.py

5f73c5d

Merge branch 'huggingface:main' into ip-adapter-dtype

2c37ebd

Update ip_adapter.py

919ca6c

Update ip_adapter.py

2bf087d

Update ip_adapter.py

598957a

Update ip_adapter.py

e6fd4e4

Merge branch 'main' into ip-adapter-dtype

c162d73

Apply style fixes

b0f0a0f

Merge branch 'main' into ip-adapter-dtype

8f598eb

Merge branch 'main' into ip-adapter-dtype

70b5f3c

asomoza added the close-to-merge label Mar 3, 2025

asomoza approved these changes Mar 3, 2025

View reviewed changes

asomoza requested a review from hlky March 3, 2025 23:30

hlky approved these changes Mar 4, 2025

View reviewed changes

hlky merged commit 30cef6b into huggingface:main Mar 4, 2025
11 of 12 checks passed

CyberVy deleted the ip-adapter-dtype branch March 4, 2025 10:19

asomoza removed the close-to-merge label Mar 4, 2025

nPeppon mentioned this pull request Apr 16, 2025

Fix wrong dtype argument name as torch_dtype #11346

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve load_ip_adapter RAM Usage #10948

Improve load_ip_adapter RAM Usage #10948

Uh oh!

CyberVy commented Mar 3, 2025 •

edited

Loading

Uh oh!

asomoza commented Mar 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 3, 2025

Uh oh!

hlky commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

CyberVy commented Mar 3, 2025 •

edited

Loading

Uh oh!

CyberVy commented Mar 3, 2025 •

edited

Loading

Uh oh!

asomoza commented Mar 3, 2025 •

edited

Loading

Uh oh!

hlky left a comment

Uh oh!

Uh oh!

Uh oh!

Improve load_ip_adapter RAM Usage #10948

Improve load_ip_adapter RAM Usage #10948

Uh oh!

Conversation

CyberVy commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Mar 3, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Mar 3, 2025

Uh oh!

hlky commented Mar 3, 2025

Uh oh!

github-actions bot commented Mar 3, 2025

Uh oh!

CyberVy commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CyberVy commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

asomoza commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hlky left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

CyberVy commented Mar 3, 2025 •

edited

Loading

CyberVy commented Mar 3, 2025 •

edited

Loading

CyberVy commented Mar 3, 2025 •

edited

Loading

asomoza commented Mar 3, 2025 •

edited

Loading