Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mistal-chat cuda out of memory error #14

Open
yuvalshachaf opened this issue May 26, 2024 · 7 comments
Open

mistal-chat cuda out of memory error #14

yuvalshachaf opened this issue May 26, 2024 · 7 comments

Comments

@yuvalshachaf
Copy link

I am getting memory error using the model i have trained from the tutorial, any idea? im using A10G GPU

@danielhanchen
Copy link

Not a Mistral official product, and shameless promotion, but if you're having OOM issues, have a go with Unsloth :) You get 70%+ memory reduction, 2x faster and no accuracy degradation! https://github.com/unslothai/unsloth Mistral v0.3 7b via a free Colab: https://colab.research.google.com/drive/1_yNCks4BTD5zOnjozppphh5GzMFaMKq_?usp=sharing

@mosh98
Copy link

mosh98 commented May 27, 2024

reducing the sequence length to under 8000 helped me

@yuvalshachaf
Copy link
Author

yuvalshachaf commented May 27, 2024 via email

@yuvalshachaf yuvalshachaf changed the title mistal-chat cude out of memory error mistal-chat cuda out of memory error May 27, 2024
@NathanMayPro
Copy link

Lora is not the cause of you're error
When inference:

  • Empty model (13.5gb) + self_attention (depending of the size of you're input) + activation (depending of number of parameters)
    When training:
  • same as previous + gradient (depending of number of parameters) + optimizer
    With LoRa during training:
  • the gradient and optimizer are no longer depending of you're original model size but the LoRa size (A,rank), (rank, B)

@halilergul1
Copy link

halilergul1 commented Jun 9, 2024

reducing the sequence length to under 8000 helped me

Hello @mosh98, you mean reducing seq length while training? For sure it helps during training.

Any other updates on this issue? I get the same cuda error while trying to get inferences from the fine-tuned model. The problem arises due to load_lora method I think. Because just like @yuvalshachaf said, the lora model becomes 10gb when loading!

@BlahBlah314
Copy link

Hello,

I have the same problem. The inference with Nemo-instruct is OK on an A100. But when finetuned, using the inference method described in the tutorial with model.load_lora, I get an OOM with the same A100. How did you solve it if you indeed solved it ? Maybe merging the adapter with peft before the inference is a solution ?

@C3po-D2rd2
Copy link

Same issue here, did anyone found a way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants