Skip to content

ggml : add ggml_gelu_erf() #13667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 21, 2025
Merged

ggml : add ggml_gelu_erf() #13667

merged 5 commits into from
May 21, 2025

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented May 20, 2025

I couldn't add it as a param because ggml_gelu is an unary op and I'm not quite sure how to add a new param to unary op (Please tell me if I should still add it as a param instead)

Despite the name "not approximated", I'm actually using an approximation (more accurate than tanh) in Metal impl because there is no erf() function build-in on Metal. The built-in GNU impl also seems to base on a complicated approx, so I think all systems are now using some short of approx, it's just more complicated than tanh so it gives a better result.

So, maybe ggml_gelu_na is not the best name.

I haven't tested this with ultravox, will report back the result a bit later.

@ngxson ngxson requested a review from ggerganov May 20, 2025 16:48
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 20, 2025
@ggerganov
Copy link
Member

So, maybe ggml_gelu_na is not the best name.

ggml_gelu_erf()?

@ngxson
Copy link
Collaborator Author

ngxson commented May 20, 2025

I retried the ultravox and it is now giving the correct result on both metal and cpu

Also just to note that, the original whisper impl from openai seem to use this type of gelu (with approximation = none by default), so probably this change will further increase the precision of whisper.cpp. We just need to change all the gelu to gelu_erf (including the one in conv block), cc @danbev if you are interested in doing a test

@ngxson
Copy link
Collaborator Author

ngxson commented May 20, 2025

Here is the GGML output (conv + transformer + post layer norm):

after output norm.shape = [1280, 512]
after output norm.data: [
     [
      [ -0.1812,  -0.4413,  -0.0398, ...,  -0.3310,   0.0529,  -0.4211],
      [ -0.2426,  -0.7433,   0.0328, ...,  -0.1948,  -0.2125,  -0.3387],
      [ -0.1757,  -0.6223,  -0.1927, ...,  -0.3855,  -0.0607,  -0.3789],
      ..., 
      [ -0.2057,   0.0134,  -0.1241, ...,   0.0104,   0.3419,  -0.1225],
      [ -0.1576,  -0.3597,  -0.0085, ...,   0.0309,  -0.0150,  -0.2440],
      [ -0.1471,  -0.1170,  -0.1523, ...,   0.1099,   0.1881,  -0.2656],
     ],
    ]
after output norm sum.shape = [1]
after output norm sum.data: [
     [
      [-478.1041],
     ],
    ]

And python output:

tensor([[[-0.1812, -0.4412, -0.0399,  ..., -0.3313,  0.0527, -0.4211],
         [-0.2425, -0.7431,  0.0330,  ..., -0.1949, -0.2127, -0.3389],
         [-0.1757, -0.6225, -0.1928,  ..., -0.3854, -0.0605, -0.3790],
         ...,
         [-0.2057,  0.0133, -0.1241,  ...,  0.0102,  0.3418, -0.1227],
         [-0.1576, -0.3597, -0.0084,  ...,  0.0308, -0.0150, -0.2442],
         [-0.1471, -0.1172, -0.1523,  ...,  0.1097,  0.1881, -0.2657]]]) tensor(-477.8432)

@ngxson ngxson changed the title ggml : add ggml_gelu_na (not approximated) ggml : add ggml_gelu_erf() May 20, 2025
@ngxson ngxson merged commit cf4cb59 into ggml-org:master May 21, 2025
46 checks passed
infil00p pushed a commit to baseweight/llama.cpp that referenced this pull request May 22, 2025
* ggml : add ggml_gelu_na (not approximated)

* fix naming order

* rename na --> erf

* apply review suggesions

* revert naming order
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants