Add FusedLinearCrossEntropy to Gemma #111

Luke-Chesley · 2024-08-26T21:25:29Z

Summary

This PR adds FusedLinearCrossEntropy support for gemma to resolve
issue #101.

Details

I based the code in this PR off of #93 which does the same thing for Mistral.

Since the parameters fused_linear_cross_entropy and cross_entropy cannot both be true, fused_linear_cross_entropy has to be explicitly set to False in test_mini_models.py, to avoid failing convergence
test due to setting both True.
In test_mini_models_no_logits.py, I add two extra tests for gemma to ensure that FusedLinearCrossEntropy part in gemma is tested.

Testing Done

Hardware Type: NVIDIA A100-SXM4-80GB
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Fixing ~/Liger-Kernel/test/convergence/test_mini_models_no_logits.py
Skipped 1 files
reformatted /home/luke/Desktop/Liger-Kernel/src/liger_kernel/transformers/model/gemma.py
reformatted /home/luke/Desktop/Liger-Kernel/src/liger_kernel/transformers/monkey_patch.py
reformatted /home/luke/Desktop/Liger-Kernel/test/convergence/test_mini_models_no_logits.py

All done! ✨ 🍰 ✨
3 files reformatted, 51 files left unchanged.
make: *** [Makefile:14: checkstyle] Error 1
luke@luke:~/Desktop/Liger-Kernel$ 


root@C.12106734:~/Liger-Kernel$ make test
python -m pytest --disable-warnings test/ --ignore=test/convergence
======================================================= test session starts =======================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
rootdir: /root/Liger-Kernel
plugins: hypothesis-6.108.4, anyio-4.4.0
collected 130 items                                                                                                               

test/transformers/test_cross_entropy.py ..........................................................                          [ 44%]
test/transformers/test_fused_linear_cross_entropy.py ......                                                                 [ 49%]
test/transformers/test_geglu.py ........                                                                                    [ 55%]
test/transformers/test_rms_norm.py ................................                                                         [ 80%]
test/transformers/test_rope.py ............                                                                                 [ 89%]
test/transformers/test_swiglu.py ........                                                                                   [ 95%]
test/transformers/test_trainer_integration.py ...                                                                           [ 97%]
test/transformers/test_transformers_monkey_patch.py .                                                                       [ 98%]
test/triton/test_triton_monkey_patch.py ..                                                                                  [100%]

====================================================== 130 passed in 44.12s =======================================================



root@C.12106734:~/Liger-Kernel$ make test-convergence
HF_DATASETS_OFFLINE=1 python -m pytest --disable-warnings test/convergence
======================================================= test session starts =======================================================
platform linux -- Python 3.11.9, pytest-8.3.2, pluggy-1.5.0
rootdir: /root/Liger-Kernel
plugins: hypothesis-6.108.4, anyio-4.4.0
collected 14 items                                                                                                                

test/convergence/test_mini_models.py ........                                                                               [ 57%]
test/convergence/test_mini_models_no_logits.py ......                                                                       [100%]

================================================= 14 passed in 125.51s (0:02:05) ==================================================

modified: src/liger_kernel/transformers/monkey_patch.py modified: test/convergence/test_mini_models.py modified: test/convergence/test_mini_models_no_logits.py

tyler-romero · 2024-08-27T00:02:16Z

src/liger_kernel/transformers/monkey_patch.py

    rms_norm: bool = True,
    geglu: bool = True,
+    swiglu: bool = True,


nit: unused arg

modified: src/liger_kernel/transformers/monkey_patch.py modified: test/convergence/test_mini_models_no_logits.py

Luke-Chesley · 2024-08-27T12:33:08Z

changed how the kwargs are passed in test_mini_models_no_logits.py to how they are passed in test_mini_models.py to fix the unused arg. all tests still pass

yundai424 · 2024-08-27T15:11:39Z

src/liger_kernel/transformers/monkey_patch.py

@@ -143,6 +145,8 @@ def apply_liger_kernel_to_gemma(
        modeling_gemma.CrossEntropyLoss = LigerCrossEntropyLoss
    if geglu:
        modeling_gemma.GemmaMLP = LigerGEGLUMLP
+    if fused_linear_cross_entropy:


could you help to add an assertion here to make sure fused_linear_cross_entropy and cross_entropy are not True together, just like in llama monkey patch function? (looks like we dropped this in qwen2 too, would be helpful if you could add it for qwen too!)

modified: src/liger_kernel/transformers/monkey_patch.py

yundai424

lgtm, thanks!

qingquansong

LGTM!

DocShotgun · 2024-08-27T17:45:56Z

Just had a quick question for clarification - is this for the original Gemma, or Gemma2 - or are the changes functional for both? I noticed the Liger Kernel README refers to the API for patching Gemma 2 as liger_kernel.transformers.apply_liger_kernel_to_gemma.

EDIT: I was informed that this code doesn't seem to include the softcapping used in Gemma2's forward:
https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L1054-L1057
So perhaps it's only fully functional for the original Gemma?

new file: src/liger_kernel/transformers/model/gemma.py

b9a475d

modified: src/liger_kernel/transformers/monkey_patch.py modified: test/convergence/test_mini_models.py modified: test/convergence/test_mini_models_no_logits.py

tyler-romero reviewed Aug 27, 2024

View reviewed changes

fix unused arg

674236e

modified: src/liger_kernel/transformers/monkey_patch.py modified: test/convergence/test_mini_models_no_logits.py

yundai424 reviewed Aug 27, 2024

View reviewed changes

add fused_linear_cross_entropy/cross_entrpy assertion

1e62194

modified: src/liger_kernel/transformers/monkey_patch.py

yundai424 approved these changes Aug 27, 2024

View reviewed changes

qingquansong approved these changes Aug 27, 2024

View reviewed changes

ByronHsu merged commit a49e421 into linkedin:main Aug 27, 2024

qingquansong mentioned this pull request Aug 27, 2024

Add gemma lightning example for single L40 GPU #120

Merged

3 tasks

This was referenced Aug 27, 2024

[WIP] Fix confusion on Gemma #121

Merged

Update supported models for Liger Kernel axolotl-ai-cloud/axolotl#1875

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FusedLinearCrossEntropy to Gemma #111

Add FusedLinearCrossEntropy to Gemma #111

Luke-Chesley commented Aug 26, 2024 •

edited

Loading

tyler-romero Aug 27, 2024

Luke-Chesley commented Aug 27, 2024 •

edited

Loading

yundai424 Aug 27, 2024

yundai424 left a comment

qingquansong left a comment

DocShotgun commented Aug 27, 2024 •

edited

Loading

Add FusedLinearCrossEntropy to Gemma #111

Add FusedLinearCrossEntropy to Gemma #111

Conversation

Luke-Chesley commented Aug 26, 2024 • edited Loading

Summary

Details

Testing Done

tyler-romero Aug 27, 2024

Choose a reason for hiding this comment

Luke-Chesley commented Aug 27, 2024 • edited Loading

yundai424 Aug 27, 2024

Choose a reason for hiding this comment

yundai424 left a comment

Choose a reason for hiding this comment

qingquansong left a comment

Choose a reason for hiding this comment

DocShotgun commented Aug 27, 2024 • edited Loading

Luke-Chesley commented Aug 26, 2024 •

edited

Loading

Luke-Chesley commented Aug 27, 2024 •

edited

Loading

DocShotgun commented Aug 27, 2024 •

edited

Loading