-
Notifications
You must be signed in to change notification settings - Fork 6.1k
Improve load_ip_adapter RAM Usage #10948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
nice! do you measure the ram savings? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@bot /style |
Style fixes have been applied. View the workflow run here. |
Yep! The peak CPU RAM usage decreases from 8.9G to 2.1G when I load an FP16 SDXL IP Adapter to GPU. |
It's very easy to measure RAM savings in Colab. Here are two minimal reproductions. Now from transformers import CLIPVisionModelWithProjection
import torch
model = CLIPVisionModelWithProjection.from_pretrained("eramth/ip-adapter",subfolder="sdxl_models/image_encoder",torch_dtype=torch.float16)
# The CPU RAM usage is about 2.3G. model = model.to("cuda")
# The CPU RAM usage is still about 2.3G. And the VRAM usage is about 3.8G # The below code can return the released CPU RAM to the system, so that we can observe the CPU RAM usage easily.
import ctypes
ctypes.CDLL("libc.so.6").malloc_trim(0)
# The CPU RAM usage is about 2.1G. Before from transformers import CLIPVisionModelWithProjection
import torch
model = CLIPVisionModelWithProjection.from_pretrained("eramth/ip-adapter",subfolder="sdxl_models/image_encoder")
# The CPU RAM usage is about 8.9G. model = model.to("cuda",dtype=torch.float16)
# The CPU RAM usage is still about 6.4G. And the VRAM usage is about 3.8G # The below code can return the released CPU RAM to the system, so that we can observe the CPU RAM usage easily.
import ctypes
ctypes.CDLL("libc.so.6").malloc_trim(0)
# The CPU RAM usage is about 2.2G. Also you can get the same memory usage results directly loading an IP Adapter in |
thanks a lot, I don't have a low RAM system so this is hard to catch for me at least. We have some errors in the tests which I believe aren't related to this PR. @hlky can you review this too please, there's a lot of failed tests so just to be sure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @CyberVy. Nice improvement for low RAM users.
Failing tests are unrelated.
Loading a model with
torch_dtype
isNone
usingtransformers.modeling_utils.PreTrainedModel
will result in additional RAM usage because of data type conversion.There is no conversion if the data type of the model weights is the same as
torch_dtype
.And different conversions have different impacts on RAM usage which is mentioned in #10679.
When
torch_dtype
isNone
,PreTrainedModel
will regard it astorch.float32
, which causes a lot more RAM usage.This PR is to improve this issue.
@asomoza