Fix Gemma RMSNorm #85

davidgonmar · 2024-08-25T16:53:13Z

Summary

Fixes #74. Allows Gemma's RMSNorm by adding a generic offset parameter to RMSNorm.

Testing Done

Parametrize the RMSNorm tests to test both Llama's and Gemma's versions (as per Hugging Face's transformers library).

Hardware Type: tested on NVIDIA L4/L40
run make test to ensure correctness
run make checkstyle to ensure code style
run make test-convergence to ensure convergence

Test outputs

test/transformers/test_rms_norm.py ................................                                                                                                                                                                                            [100%]

========================================================================================================================= 32 passed in 3.74s =========================================================================================================================

test/convergence/test_mini_models.py ......                                                                                                                                                                                                                    [100%]

========================================================================================================================= 6 passed in 32.04s =========================================================================================================================

ByronHsu

Awesome! Our first kernel related PR :-) Left a comment

ByronHsu · 2024-08-25T17:57:54Z

test/transformers/test_rms_norm.py

+        return x * torch.rsqrt(x.pow(2).mean(-1, keepdim=True) + self.eps)
+
+    def forward(self, x):
+        output = self._norm(x.float())


@yundai424 I am curious if we need to do the upcasting too in our kernel? or we don't because anyway it gets casted to x.dtype. Our old impl doesn't have that but the result is still consistent

@yundai424 I am curious if we need to do the upcasting too in our kernel? or we don't because anyway it gets casted to x.dtype. Our old impl doesn't have that but the result is still consistent

The test results seem to be consistent with the reference too. I think it's a tradeoff between complexity (because some models have slight inconsistencies) and exact reproduction. Let me know what you think and I can modify it (or even on another PR, to keep things clean)

lgtm! waiting @yundai424 to final check

can we add gemma to convergence test as well?

can we add gemma to convergence test as well?

adding it

can we add gemma to convergence test as well?

done. as mentioned in the discord channel, I made the tolerance for bfloat16 convergence test a bit bigger, since the casting difference seems to be slightly affecting it (for the regular tests it's fine, but it seems to add up in the end-to-end training situation). if we end up deciding to match it one-to-one in the future, we can make it stricter. let me know what you think

my read is yes we need to change the dtype to x to fp32. Regardless of if it's llama or gemma they both do the norm part in fp32, only difference is gemma doing the scaling in fp32 too while llama do it in mixed precision. But of course we can do it in a separate PR 🤔

ByronHsu · 2024-08-25T18:00:32Z

May you join our discord and say hi? The PR looks very solid and we would love more contribution from you https://discord.gg/CX2YmNmn

ByronHsu · 2024-08-25T18:04:55Z

Also can you add your hardware type in the PR description? I just updated PR template

yundai424

LGTM, thanks for the contribution! Let's create a follow-up issue for the dtype thing

davidgonmar closed this Aug 25, 2024

davidgonmar force-pushed the gemma-rmsnorm branch from 49c8823 to 52d317a Compare August 25, 2024 16:54

add offset param to rmsnorm

05dd1f9

davidgonmar reopened this Aug 25, 2024

davidgonmar marked this pull request as ready for review August 25, 2024 17:34

ByronHsu reviewed Aug 25, 2024

View reviewed changes

davidgonmar changed the title ~~Allow Gemma RMSNorm~~ Fix Gemma RMSNorm Aug 25, 2024

add gemma to convergence tests

683002a

yundai424 approved these changes Aug 25, 2024

View reviewed changes

austin362667 approved these changes Aug 26, 2024

View reviewed changes

lancerts merged commit a8e433b into linkedin:main Aug 26, 2024
1 check passed

lancerts mentioned this pull request Aug 26, 2024

Fix Dtype Issue of Gemma RMSNorm #89

Closed

tyler-romero mentioned this pull request Aug 26, 2024

Monkeypatch for Phi3 #76

Merged

3 tasks

chiwanpark mentioned this pull request Sep 1, 2024

Fix RMSNorm monkey patch for Gemma models axolotl-ai-cloud/axolotl#1886

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Gemma RMSNorm #85

Fix Gemma RMSNorm #85

davidgonmar commented Aug 25, 2024 •

edited

Loading

ByronHsu left a comment

ByronHsu Aug 25, 2024

davidgonmar Aug 25, 2024

ByronHsu Aug 25, 2024

ByronHsu Aug 25, 2024

davidgonmar Aug 25, 2024

davidgonmar Aug 25, 2024 •

edited

Loading

yundai424 Aug 25, 2024

ByronHsu commented Aug 25, 2024 •

edited

Loading

ByronHsu commented Aug 25, 2024

yundai424 left a comment

Fix Gemma RMSNorm #85

Fix Gemma RMSNorm #85

Conversation

davidgonmar commented Aug 25, 2024 • edited Loading

Summary

Testing Done

Test outputs

ByronHsu left a comment

Choose a reason for hiding this comment

ByronHsu Aug 25, 2024

Choose a reason for hiding this comment

davidgonmar Aug 25, 2024

Choose a reason for hiding this comment

ByronHsu Aug 25, 2024

Choose a reason for hiding this comment

ByronHsu Aug 25, 2024

Choose a reason for hiding this comment

davidgonmar Aug 25, 2024

Choose a reason for hiding this comment

davidgonmar Aug 25, 2024 • edited Loading

Choose a reason for hiding this comment

yundai424 Aug 25, 2024

Choose a reason for hiding this comment

ByronHsu commented Aug 25, 2024 • edited Loading

ByronHsu commented Aug 25, 2024

yundai424 left a comment

Choose a reason for hiding this comment

davidgonmar commented Aug 25, 2024 •

edited

Loading

davidgonmar Aug 25, 2024 •

edited

Loading

ByronHsu commented Aug 25, 2024 •

edited

Loading