-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when using huggingface pretrained model with multi-gpu, model parameters were duplicate for every gpu in ram #17043
Comments
if you got multi gpu, the cost cpu memory = 2 * model_size* gpu_numbers |
Hey @linyubupa This is expected in the way you are initializing the model. I can see from the code snippet that you create the model in def configure_sharded_model(self):
self.model = AutoModelForCausalLM.from_pretrained(...) Here is the documentation for working with deepspeed models (and also documentation for configure_sharded_model): https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html#shard-model-instantly-to-reduce-initialization-time-memory |
Please let me know if that helps :) |
I had the same problem, but this method didn't solve it
I had the same problem, but this method didn't solve it |
Yeah had same issue, and above does not solve it |
sorry for late reply,I build up model in configure_sharded_model , but the cpu memory still cost amountly |
any solution about this? i really need help. |
I am facing a similar issue |
I think one possible solution is to convert the pretrained model weights to the deepspeed zero3 shared model formate, but I haven't tried it yet. |
Is there code to try it out? |
i tried like this. `def configure_sharded_model(self):
The problem is, when i use from_pretrained("~~") in LightningModule's configure_sharded_model, Lightning Strategy Deepspeed 3 disturbs from_pretrained's assign work. so, i tried using manual assignment rather than using from_pretrained, from sharded checkpoint file's parameter tensor to my model variable. i didn't firmly figured out all of this, but i experimented below things.
so, i think calling pretrained parameter file manually, and change my random initialized model parameters in deepspeed.zero.GatheredParameters is suitable approach. |
I solve this by using deepspeed init with transformers trainer : https://huggingface.co/docs/transformers/main_classes/deepspeed |
Is there any update on this issue? |
update? |
same... |
any updates? |
Bug description
when using huggingface pretrained model with multi-gpu, model parameters were duplicate for every gpu in ram
How to reproduce the bug
Error messages and logs
Environment
Current environment
More info
No response
cc @awaelchli
The text was updated successfully, but these errors were encountered: