Skip to content

Improve load_ip_adapter RAM Usage #10948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 4, 2025
Merged

Conversation

CyberVy
Copy link
Contributor

@CyberVy CyberVy commented Mar 3, 2025

Loading a model with torch_dtype is None using transformers.modeling_utils.PreTrainedModel will result in additional RAM usage because of data type conversion.
There is no conversion if the data type of the model weights is the same as torch_dtype.
And different conversions have different impacts on RAM usage which is mentioned in #10679.
When torch_dtype is None, PreTrainedModel will regard it as torch.float32, which causes a lot more RAM usage.

This PR is to improve this issue.

@asomoza

@asomoza
Copy link
Member

asomoza commented Mar 3, 2025

nice! do you measure the ram savings?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@hlky
Copy link
Contributor

hlky commented Mar 3, 2025

@bot /style

Copy link
Contributor

github-actions bot commented Mar 3, 2025

Style fixes have been applied. View the workflow run here.

@CyberVy
Copy link
Contributor Author

CyberVy commented Mar 3, 2025

nice! do you measure the ram savings?

Yep! The peak CPU RAM usage decreases from 8.9G to 2.1G when I load an FP16 SDXL IP Adapter to GPU.

@CyberVy
Copy link
Contributor Author

CyberVy commented Mar 3, 2025

It's very easy to measure RAM savings in Colab.

Here are two minimal reproductions.

Now

from transformers import CLIPVisionModelWithProjection
import torch
model = CLIPVisionModelWithProjection.from_pretrained("eramth/ip-adapter",subfolder="sdxl_models/image_encoder",torch_dtype=torch.float16)
# The CPU RAM usage is about 2.3G.
model = model.to("cuda")
# The CPU RAM usage is still about 2.3G. And the VRAM usage is about 3.8G
# The below code can return the released CPU RAM to the system, so that we can observe the CPU RAM usage easily.
import ctypes
ctypes.CDLL("libc.so.6").malloc_trim(0)
# The CPU RAM usage is about 2.1G.

Before

from transformers import CLIPVisionModelWithProjection
import torch
model = CLIPVisionModelWithProjection.from_pretrained("eramth/ip-adapter",subfolder="sdxl_models/image_encoder")
# The CPU RAM usage is about 8.9G.
model = model.to("cuda",dtype=torch.float16)
# The CPU RAM usage is still about 6.4G. And the VRAM usage is about 3.8G
# The below code can return the released CPU RAM to the system, so that we can observe the CPU RAM usage easily.
import ctypes
ctypes.CDLL("libc.so.6").malloc_trim(0)
# The CPU RAM usage is about 2.2G.

Also you can get the same memory usage results directly loading an IP Adapter in diffusers.

@asomoza
Copy link
Member

asomoza commented Mar 3, 2025

thanks a lot, I don't have a low RAM system so this is hard to catch for me at least. We have some errors in the tests which I believe aren't related to this PR. @hlky can you review this too please, there's a lot of failed tests so just to be sure.

@asomoza asomoza requested a review from hlky March 3, 2025 23:30
Copy link
Contributor

@hlky hlky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @CyberVy. Nice improvement for low RAM users.

Failing tests are unrelated.

@hlky hlky merged commit 30cef6b into huggingface:main Mar 4, 2025
11 of 12 checks passed
@CyberVy CyberVy deleted the ip-adapter-dtype branch March 4, 2025 10:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants