mistal-chat cuda out of memory error #14

yuvalshachaf · 2024-05-26T17:22:52Z

I am getting memory error using the model i have trained from the tutorial, any idea? im using A10G GPU

danielhanchen · 2024-05-27T08:09:24Z

Not a Mistral official product, and shameless promotion, but if you're having OOM issues, have a go with Unsloth :) You get 70%+ memory reduction, 2x faster and no accuracy degradation! https://github.com/unslothai/unsloth Mistral v0.3 7b via a free Colab: https://colab.research.google.com/drive/1_yNCks4BTD5zOnjozppphh5GzMFaMKq_?usp=sharing

mosh98 · 2024-05-27T13:43:16Z

reducing the sequence length to under 8000 helped me

yuvalshachaf · 2024-05-27T14:06:53Z

Hi thx Of course But the main problem is that Lora model of 80mb becomes 10gb and gets me to memory error The base model is 13.5gb on disk on on GPU. How come the Lora model becomes to big?

…

On Mon, May 27, 2024, 16:43 Mosleh Mahamud ***@***.***> wrote: reducing the sequence length to under 8000 helped me — Reply to this email directly, view it on GitHub <#14 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACF55NPVTTZAVWWHCIUPNJ3ZEM2AXAVCNFSM6AAAAABIJ6W7O6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZTGUYTKOJVGQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

NathanMayPro · 2024-05-27T20:57:18Z

Lora is not the cause of you're error
When inference:

Empty model (13.5gb) + self_attention (depending of the size of you're input) + activation (depending of number of parameters)
When training:
same as previous + gradient (depending of number of parameters) + optimizer
With LoRa during training:
the gradient and optimizer are no longer depending of you're original model size but the LoRa size (A,rank), (rank, B)

halilergul1 · 2024-06-09T04:41:04Z

reducing the sequence length to under 8000 helped me

Hello @mosh98, you mean reducing seq length while training? For sure it helps during training.

Any other updates on this issue? I get the same cuda error while trying to get inferences from the fine-tuned model. The problem arises due to load_lora method I think. Because just like @yuvalshachaf said, the lora model becomes 10gb when loading!

BlahBlah314 · 2024-07-30T09:54:01Z

Hello,

I have the same problem. The inference with Nemo-instruct is OK on an A100. But when finetuned, using the inference method described in the tutorial with model.load_lora, I get an OOM with the same A100. How did you solve it if you indeed solved it ? Maybe merging the adapter with peft before the inference is a solution ?

C3po-D2rd2 · 2024-08-20T14:15:21Z

Same issue here, did anyone found a way?

yuvalshachaf changed the title ~~mistal-chat cude out of memory error~~ mistal-chat cuda out of memory error May 27, 2024

CodeWithOz mentioned this issue Jun 12, 2024

CUDA out of memory during training #69

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mistal-chat cuda out of memory error #14

mistal-chat cuda out of memory error #14

yuvalshachaf commented May 26, 2024

danielhanchen commented May 27, 2024

mosh98 commented May 27, 2024

yuvalshachaf commented May 27, 2024 via email

NathanMayPro commented May 27, 2024

halilergul1 commented Jun 9, 2024 •

edited

Loading

BlahBlah314 commented Jul 30, 2024

C3po-D2rd2 commented Aug 20, 2024

mistal-chat cuda out of memory error #14

mistal-chat cuda out of memory error #14

Comments

yuvalshachaf commented May 26, 2024

danielhanchen commented May 27, 2024

mosh98 commented May 27, 2024

yuvalshachaf commented May 27, 2024 via email

NathanMayPro commented May 27, 2024

halilergul1 commented Jun 9, 2024 • edited Loading

BlahBlah314 commented Jul 30, 2024

C3po-D2rd2 commented Aug 20, 2024

halilergul1 commented Jun 9, 2024 •

edited

Loading