Add LoRA and QLoRA example #1889

james77777778 · 2024-07-12T09:15:05Z

This example borrows a lot of description from https://keras.io/examples/nlp/parameter_efficient_finetuning_of_gpt2_with_lora/
Therefore, I have added the original authors as this example's authors.

Basically, this example demonstrates the fine-tuning of Gemma with LoRA and QLoRA on a French-to-English translation task.

Please let me know if the .py is ready. Then, I will submit the .md and .ipynb.

Note that we need the latest code from KerasNLP since the quantization support hasn't been released yet.

cc @fchollet

fchollet

Looking good -- thanks for the PR!

fchollet · 2024-07-18T18:09:21Z

examples/keras_recipes/parameter_efficient_finetuning_of_gemma_with_lora_and_qlora.py

+import os
+
+os.environ["KERAS_BACKEND"] = "tensorflow"
+os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"


Why is this necessary?

TF_CPP_MIN_LOG_LEVEL=3 is used to suppress verbose logging from TF

examples/keras_recipes/parameter_efficient_finetuning_of_gemma_with_lora_and_qlora.py

mattdangerw · 2024-07-19T17:38:29Z

Still need to read over the guide, but a general note...

We are working on adding pre quantized versions of Gemma 2. When we do I believe that we will be able to show lora fine-tuning on the 9b model on a free tier colab GPU. 9b * 1 byte per weight + a relatively small number of trainable parameters = ~9GB of vRAM. Does that track with your understanding @james77777778 ?

Once we have those up, we should definitely show that too. That's when things get really exiting -- fine-tuning a model that normally wouldn't even fit on an accelerator.

james77777778 · 2024-07-20T11:46:04Z

9b * 1 byte per weight + a relatively small number of trainable parameters = ~9GB of vRAM. Does that track with your understanding @james77777778 ?

Yes, the number should be correct for loading the model. However, the vRAM requirement might be (much) larger than 9GB if we want to do backpropagation.

With my 10GB vRAM rig, I can barely run this example (Gemma 2B) using JAX after this patch keras-team/keras#19954.

fchollet · 2024-07-20T21:51:40Z

Once we have those up, we should definitely show that too.

Should we wait for those additions before merging?

mattdangerw · 2024-07-30T23:18:52Z

@fchollet I think we can merge with what we have an extend later. No need to block!

mattdangerw · 2024-07-30T23:33:38Z

@james77777778 I though that with "qlora" like flow, the back propagation requirements should actually be about the same as inference. There's definitely some overhead for intermediate activations and gradients (using a short sequence length will help here), but we only need to keep around the gradients for the lora trainable parameters, which will add up to just a few MB. Overall the dominant memory requirements should just be 1 byte per frozen quantized parameter.

That's the principal that allows a bitsandbytes colab like this, training a 20b parameter model on free tier resources...
https://colab.research.google.com/drive/1VoYNfYDKcKRQRor98Zbf2-9VQTtGJ24k?usp=sharing

For us we only go down to 8 bits a weight (that colab show 4 bits per weight), so we won't be able to tune a 20b parameter model on free hardward. But I think we could tune a pre-quantized model of up to 10b on a T4.

james77777778 · 2024-07-31T02:48:29Z

we only need to keep around the gradients for the lora trainable parameters, which will add up to just a few MB. Overall the dominant memory requirements should just be 1 byte per frozen quantized parameter.

I'm unsure how JAX, TF and Torch calculate the gradients, but it is true that we only need a small amount of memory for the qlora-like technique.

However, we might also need a quantized optimizer to achieve this optimal reduction in memory usage.
Please see https://huggingface.co/docs/bitsandbytes/main/en/explanations/optimizers
It involves some low-level ops that might be hard to implement in Keras

For us we only go down to 8 bits a weight (that colab show 4 bits per weight), so we won't be able to tune a 20b parameter model on free hardward. But I think we could tune a pre-quantized model of up to 10b on a T4.

I can try a 7B/8B model on colab T4 once the quantized models are uploaded keras-team/keras-hub#1720

mattdangerw · 2024-07-31T03:07:20Z

However, we might also need a quantized optimizer to achieve this optimal reduction in memory usage.
Please see https://huggingface.co/docs/bitsandbytes/main/en/explanations/optimizers
It involves some low-level ops that might be hard to implement in Keras

thanks for the pointer! i'll have a look

james77777778 · 2024-08-02T01:15:23Z

@fchollet @mattdangerw
Please let me know if this is ready. If so, I will push .md and .ipynb.

Note that I can only run this example on a CPU using jupyter notebook due to the GPU OOM issue.
This issue doesn't occur when I run the script on the same machine without using the notebook.

mattdangerw · 2024-08-06T02:04:37Z

@james77777778 ready I think!

james77777778 · 2024-08-06T06:40:51Z

@fchollet @mattdangerw

This PR should be ready now.
(Also fixed the link error.)

fchollet

LGTM, thank you for the contribution!

github-actions bot assigned sachinprasadhs Jul 12, 2024

james77777778 force-pushed the add-lora-qlora branch from d8c76d8 to 38be789 Compare July 12, 2024 09:20

sachinprasadhs added the keras-team-review-pending label Jul 12, 2024

mattdangerw self-requested a review July 18, 2024 16:24

mattdangerw removed the keras-team-review-pending label Jul 18, 2024

fchollet reviewed Jul 18, 2024

View reviewed changes

sachinprasadhs added the stat:awaiting response from contributor label Jul 18, 2024

james77777778 requested a review from fchollet July 19, 2024 02:37

mattdangerw mentioned this pull request Jul 19, 2024

Any plans for QLora? keras-team/keras-hub#1537

Closed

sachinprasadhs added stat:awaiting keras-eng and removed stat:awaiting response from contributor labels Aug 2, 2024

james77777778 added 3 commits August 6, 2024 14:31

Add LoRA and QLoRA example

cc5a341

Use JAX as backend

f9277cc

Add md and ipynb files

958c38e

james77777778 force-pushed the add-lora-qlora branch from 75609d0 to 958c38e Compare August 6, 2024 06:32

Fix autogen.py

3373fff

Update KerasNLP link

43ea10a

fchollet approved these changes Aug 14, 2024

View reviewed changes

fchollet merged commit 4c89b46 into keras-team:master Aug 14, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LoRA and QLoRA example #1889

Add LoRA and QLoRA example #1889

james77777778 commented Jul 12, 2024

fchollet left a comment

fchollet Jul 18, 2024

james77777778 Jul 19, 2024

mattdangerw commented Jul 19, 2024

james77777778 commented Jul 20, 2024

fchollet commented Jul 20, 2024

mattdangerw commented Jul 30, 2024

mattdangerw commented Jul 30, 2024

james77777778 commented Jul 31, 2024

mattdangerw commented Jul 31, 2024

james77777778 commented Aug 2, 2024

mattdangerw commented Aug 6, 2024

james77777778 commented Aug 6, 2024 •

edited

Loading

fchollet left a comment

Add LoRA and QLoRA example #1889

Add LoRA and QLoRA example #1889

Conversation

james77777778 commented Jul 12, 2024

fchollet left a comment

Choose a reason for hiding this comment

fchollet Jul 18, 2024

Choose a reason for hiding this comment

james77777778 Jul 19, 2024

Choose a reason for hiding this comment

mattdangerw commented Jul 19, 2024

james77777778 commented Jul 20, 2024

fchollet commented Jul 20, 2024

mattdangerw commented Jul 30, 2024

mattdangerw commented Jul 30, 2024

james77777778 commented Jul 31, 2024

mattdangerw commented Jul 31, 2024

james77777778 commented Aug 2, 2024

mattdangerw commented Aug 6, 2024

james77777778 commented Aug 6, 2024 • edited Loading

fchollet left a comment

Choose a reason for hiding this comment

james77777778 commented Aug 6, 2024 •

edited

Loading