Skip to content

Testing : Compare CPU backend with GPU backend #1692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

bobqianic
Copy link
Collaborator

Phenomenon: When utilizing the CUDA backend, the transcription tends to produce hallucinations. See #1688 for more details.

Audio for error reproduction: 01-03(Easy to Learn Chinese + Second Edition + Textbook 2).zip

How to Reproduce the Error:

  1. First, grab the latest code from the master branch.
  2. Use a computer that's equipped with an NVIDIA graphics card.
  3. Compile the code with CMake, ensuring to include the -DWHISPER_CUBLAS=1 option in the process.
  4. Execute the main program, using the sample audio I've provided, the whisper-large-v2 model, and the -l zh flag.
  5. At this point, you'll notice some hallucinations in the output.
  6. Now, try running main again, just like in step 4, but this time add the -ng flag. You'll see that the hallucinations no longer occur.

This PR introduces code that compares the outputs of different backends with the CPU backend for each tensor operation. It executes the encoder once and does not produce any transcriptions.

@bobqianic
Copy link
Collaborator Author

#1691 Doesn't work for me

@bobqianic
Copy link
Collaborator Author

bobqianic commented Dec 27, 2023

CPU_backend vs CUDA_backend

Tiny

expand

Tiny (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [IM2COL] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 FAIL

Tiny (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009
 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000005 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000019 [ADD] NMSE = 0.000017 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000011 [SCALE] NMSE = 0.000011 [SOFT_MAX] NMSE = 0.000057 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000018 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000286 [MUL_MAT] NMSE = 0.000211 [ADD] NMSE = 0.000225 [ADD] NMSE = 0.000078 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000019 FAIL

Tiny (whisper_build_graph_cross)

OK

Base

expand

Base (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 FAIL

Base (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006
 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000024 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000011 [SCALE] NMSE = 0.000011 [SOFT_MAX] NMSE = 0.000037 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000010 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000115 [MUL_MAT] NMSE = 0.000122 [ADD] NMSE = 0.000120 [ADD] NMSE = 0.000112 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000024 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000019 [SCALE] NMSE = 0.000019 [SOFT_MAX] NMSE = 0.000078 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000111 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000271 [MUL_MAT] NMSE = 0.000575 [ADD] NMSE = 0.000299 [ADD] NMSE = 0.000118 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000034 [CPY] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000026 [SCALE] NMSE = 0.000026 [SOFT_MAX] NMSE = 0.000090 [MUL_MAT] NMSE = 0.000027 [CPY] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000032 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000119 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000033 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000320 [MUL_MAT] NMSE = 0.000317 [ADD] NMSE = 0.000275 [ADD] NMSE = 0.000161 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000015 FAIL

Base (whisper_build_graph_cross)

OK

Small

expand

Small (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 FAIL

Small (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000033 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005
 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000007 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000006 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000034 [ADD] NMSE = 0.000034 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000012 [SCALE] NMSE = 0.000012 [SOFT_MAX] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000034
 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000056 [ADD] NMSE = 0.000064 [ADD] NMSE = 0.000034 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000015 [SCALE] NMSE = 0.000015 [SOFT_MAX] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000015 [ADD] NMSE = 0.000033 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000070 [MUL_MAT] NMSE = 0.000103 [ADD] NMSE = 0.000076 [ADD] NMSE = 0.000033 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000018 [SCALE] NMSE = 0.000018 [SOFT_MAX] NMSE = 0.000052 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000033 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000238 [MUL_MAT] NMSE = 0.000297 [ADD] NMSE = 0.000224 [ADD] NMSE = 0.000034 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000028 [CPY] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000022 [CPY] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000033 [SCALE] NMSE = 0.000033 [SOFT_MAX] NMSE = 0.000079 [MUL_MAT] NMSE = 0.000024 [CPY] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000025 [ADD] NMSE = 0.000034 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000032 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000258 [MUL_MAT] NMSE = 0.000142 [ADD] NMSE = 0.000091
 [ADD] NMSE = 0.000035 [NORM] NMSE = 0.000036 [MUL] NMSE = 0.000033 [ADD] NMSE = 0.000037 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000037 [CPY] NMSE = 0.000037 [MUL_MAT] NMSE = 0.000035 [CPY] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000032 [ADD] NMSE = 0.000026 [CPY] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000045 [SCALE] NMSE = 0.000045 [SOFT_MAX] NMSE = 0.000094 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000035 [NORM] NMSE = 0.000039 [MUL] NMSE = 0.000045 [ADD] NMSE = 0.000041 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000082 [MUL_MAT] NMSE = 0.000040 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000035 [NORM] NMSE = 0.000038 [MUL] NMSE = 0.000030 [ADD] NMSE = 0.000030 FAIL

Small (whisper_build_graph_cross)

OK

Medium

expand

Medium (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 FAIL

Medium (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002
 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000002
 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000040 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000032 [MUL_MAT] NMSE = 0.000069 [ADD] NMSE = 0.000074 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000013 [SCALE] NMSE = 0.000013 [SOFT_MAX] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000030 [CPY] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000032 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000046 [MUL_MAT] NMSE = 0.000113 [ADD] NMSE = 0.000118
 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000014 [SCALE] NMSE = 0.000014 [SOFT_MAX] NMSE = 0.000039 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000056 [ADD] NMSE = 0.000061 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000026 [CPY] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000018 [SCALE] NMSE = 0.000018 [SOFT_MAX] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000062 [ADD] NMSE = 0.000068 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000026 [CPY] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000018 [SCALE] NMSE = 0.000018 [SOFT_MAX] NMSE = 0.000048 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000043 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000016 [SCALE] NMSE = 0.000016 [SOFT_MAX] NMSE = 0.000047 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000018 [SCALE] NMSE = 0.000018 [SOFT_MAX] NMSE = 0.000057 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000033 [ADD] NMSE = 0.000033 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000021 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000022 [CPY] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000024 [SCALE] NMSE = 0.000024 [SOFT_MAX] NMSE = 0.000192 [MUL_MAT] NMSE = 0.000027 [CPY] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000019 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000033 [ADD] NMSE = 0.000029 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000013 [SCALE] NMSE = 0.000013 [SOFT_MAX] NMSE = 0.000058 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000024 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000044 [MUL_MAT] NMSE = 0.000048 [ADD] NMSE = 0.000040 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000027 [CPY] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000027 [SCALE] NMSE = 0.000027 [SOFT_MAX] NMSE = 0.000081 [MUL_MAT] NMSE = 0.000035 [CPY] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000036 [ADD] NMSE = 0.000030 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000026 [MUL] NMSE = 0.000030 [ADD] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000041 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000030 [CPY] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000022 [CPY] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000024 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000030 [SCALE] NMSE = 0.000030 [SOFT_MAX] NMSE = 0.000084 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000020 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000027 [MUL] NMSE = 0.000032 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000046 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000038 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000032 [ADD] NMSE = 0.000032 [CPY] NMSE = 0.000032 [MUL_MAT] NMSE = 0.000022 [CPY] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000019 [SCALE] NMSE = 0.000019 [SOFT_MAX] NMSE = 0.000134 [MUL_MAT] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000020 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000032 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000048 [MUL_MAT] NMSE = 0.000041 [ADD] NMSE = 0.000034 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000030 [CPY] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000024 [CPY] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000029 [SCALE] NMSE = 0.000029 [SOFT_MAX] NMSE = 0.000092 [MUL_MAT] NMSE = 0.000022
 [CPY] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000033 [ADD] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000048 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000028 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000028 [ADD] NMSE = 0.000028 [CPY] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000030 [SCALE] NMSE = 0.000030 [SOFT_MAX] NMSE = 0.000107 [MUL_MAT] NMSE = 0.000024 [CPY] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000032 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000055 [ADD] NMSE = 0.000050 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000029 [CPY] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000024 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000021 [SCALE] NMSE = 0.000021 [SOFT_MAX] NMSE = 0.000083 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [GELU] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000011 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000022 FAIL

Medium (whisper_build_graph_cross)

OK

Large-v2

expand

Large-v2 (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000010 FAIL

Large-v2 (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002
 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001
 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001
 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000015 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000024 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000032 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006
 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000039 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000041 [ADD] NMSE = 0.000042 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000044 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000044 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000049 [ADD] NMSE = 0.000050 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000014 [SCALE] NMSE = 0.000014 [SOFT_MAX] NMSE = 0.000048 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000024 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000059 [ADD] NMSE = 0.000060 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000038 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000031 [CPY] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000037 [SCALE] NMSE = 0.000037 [SOFT_MAX] NMSE = 0.000053 [MUL_MAT] NMSE = 0.000029 [CPY] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000040 [ADD] NMSE = 0.000004
 [NORM] NMSE = 0.000042 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000033 [MUL_MAT] NMSE = 0.000142 [ADD] NMSE = 0.000144 [ADD] NMSE = 0.000007 [NORM] NMSE = 0.000079 [MUL] NMSE = 0.000047 [ADD] NMSE = 0.000047 [MUL_MAT] NMSE = 0.000045 [ADD] NMSE = 0.000045 [CPY] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000042 [CPY] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000035 [CPY] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000078 [SCALE] NMSE = 0.000078 [SOFT_MAX] NMSE = 0.000097 [MUL_MAT] NMSE = 0.000049 [CPY] NMSE = 0.000049 [MUL_MAT] NMSE = 0.000082 [ADD] NMSE = 0.000080 [ADD] NMSE = 0.000008 [NORM] NMSE = 0.000075 [MUL] NMSE = 0.000056 [ADD] NMSE = 0.000048 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000090 [MUL_MAT] NMSE = 0.000488 [ADD] NMSE = 0.000495 [ADD] NMSE = 0.000025 [NORM] NMSE = 0.000067 [MUL] NMSE = 0.000051 [ADD] NMSE = 0.000052 [MUL_MAT] NMSE = 0.000064 [ADD] NMSE = 0.000064 [CPY] NMSE = 0.000064 [MUL_MAT] NMSE = 0.000039 [CPY] NMSE = 0.000039 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000032 [CPY] NMSE = 0.000032 [MUL_MAT] NMSE = 0.000061 [SCALE] NMSE = 0.000061 [SOFT_MAX] NMSE = 0.000104 [MUL_MAT] NMSE = 0.000054 [CPY] NMSE = 0.000054 [MUL_MAT] NMSE = 0.000080 [ADD] NMSE = 0.000077 [ADD] NMSE = 0.000025 [NORM] NMSE = 0.000068 [MUL] NMSE = 0.000060 [ADD] NMSE = 0.000052 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000127 [MUL_MAT] NMSE = 0.000938 [ADD] NMSE = 0.000940 [ADD] NMSE = 0.000063 [NORM] NMSE = 0.000343 [MUL] NMSE = 0.000152 [ADD] NMSE = 0.000154 [MUL_MAT] NMSE = 0.000130 [ADD] NMSE = 0.000130 [CPY] NMSE = 0.000130 [MUL_MAT] NMSE = 0.000169 [CPY] NMSE = 0.000169 [MUL_MAT] NMSE = 0.000119 [ADD] NMSE = 0.000114 [CPY] NMSE = 0.000114 [MUL_MAT] NMSE = 0.000312 [SCALE] NMSE = 0.000312 [SOFT_MAX] NMSE = 0.000324 [MUL_MAT] NMSE = 0.000217 [CPY] NMSE = 0.000217 [MUL_MAT] NMSE = 0.000305 [ADD] NMSE = 0.000293 [ADD] NMSE = 0.000064 [NORM] NMSE = 0.000380 [MUL] NMSE = 0.000357 [ADD] NMSE = 0.000316 [MUL_MAT] NMSE = 0.000055 [ADD] NMSE = 0.000054 [GELU] NMSE = 0.002536 [MUL_MAT] NMSE = 0.006002 [ADD] NMSE = 0.006002
 [ADD] NMSE = 0.002475 [NORM] NMSE = 0.000820 [MUL] NMSE = 0.000437 [ADD] NMSE = 0.000440 [MUL_MAT] NMSE = 0.000277 [ADD] NMSE = 0.000278 [CPY] NMSE = 0.000278 [MUL_MAT] NMSE = 0.000620 [CPY] NMSE = 0.000620 [MUL_MAT] NMSE = 0.000315 [ADD] NMSE = 0.000303 [CPY] NMSE = 0.000303 [MUL_MAT] NMSE = 0.001110 [SCALE] NMSE = 0.001110 [SOFT_MAX] NMSE = 0.000941 [MUL_MAT] NMSE = 0.000439 [CPY] NMSE = 0.000439 [MUL_MAT] NMSE = 0.000577 [ADD] NMSE = 0.000558 [ADD] NMSE = 0.002471 [NORM] NMSE = 0.000828 [MUL] NMSE = 0.001247 [ADD] NMSE = 0.001100 [MUL_MAT] NMSE = 0.000411 [ADD] NMSE = 0.000406 [GELU] NMSE = 0.008901 [MUL_MAT] NMSE = 0.010630 [ADD] NMSE = 0.010632 [ADD] NMSE = 0.004732 [NORM] NMSE = 0.001196 [MUL] NMSE = 0.000786 [ADD] NMSE = 0.000783 [MUL_MAT] NMSE = 0.000519 [ADD] NMSE = 0.000520 [CPY] NMSE = 0.000520 [MUL_MAT] NMSE = 0.001015 [CPY] NMSE = 0.001015 [MUL_MAT] NMSE = 0.000578 [ADD] NMSE = 0.000557 [CPY] NMSE = 0.000557 [MUL_MAT] NMSE = 0.001883 [SCALE] NMSE = 0.001883 [SOFT_MAX] NMSE = 0.001561 [MUL_MAT] NMSE = 0.000627 [CPY] NMSE = 0.000627 [MUL_MAT] NMSE = 0.000657 [ADD] NMSE = 0.000639 [ADD] NMSE = 0.004732 [NORM] NMSE = 0.001201 [MUL] NMSE = 0.001921 [ADD] NMSE = 0.001670 [MUL_MAT] NMSE = 0.000502 [ADD] NMSE = 0.000494 [GELU] NMSE = 0.009599 [MUL_MAT] NMSE = 0.012234 [ADD] NMSE = 0.012243 [ADD] NMSE = 0.006867 [NORM] NMSE = 0.001300 [MUL] NMSE = 0.000892 [ADD] NMSE = 0.000872 FAIL

Large-v2 (whisper_build_graph_cross)

OK

@slaren
Copy link
Member

slaren commented Dec 28, 2023

I have been testing this, and I think this due to precision differences with the matrix multiplication. GGML_PREC_F32 is indeed broken with fp16 src1, but it is fixed in ggml-org/ggml#669. After applying the fix and using GGML_PREC_F32, the error is much lower:

$ WHISPER_CUBLAS=1 make main && ./main -m models/ggml-large-v2.bin samples/01-03\(Easy\ to\ Learn\ Chinese\ +\ Second\ Edition\ +\ Textbook\ 2\).wav -l zh

I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
I CC: cc (Ubuntu 12.3.0-1ubuntu123.04) 12.3.0
I CXX: g++ (Ubuntu 12.3.0-1ubuntu1
23.04) 12.3.0

nvcc --forward-unknown-to-host-compiler -arch=native -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml.c -o ggml.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml-alloc.c -o ggml-alloc.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml-backend.c -o ggml-backend.o
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c ggml-quants.c -o ggml-quants.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
./main -h

usage: ./main [options] file0.wav file1.wav ...

options:
-h, --help [default] show this help message and exit
-t N, --threads N [4 ] number of threads to use during computation
-p N, --processors N [1 ] number of processors to use during computation
-ot N, --offset-t N [0 ] time offset in milliseconds
-on N, --offset-n N [0 ] segment index offset
-d N, --duration N [0 ] duration of audio to process in milliseconds
-mc N, --max-context N [-1 ] maximum number of text context tokens to store
-ml N, --max-len N [0 ] maximum segment length in characters
-sow, --split-on-word [false ] split on word rather than on token
-bo N, --best-of N [5 ] number of best candidates to keep
-bs N, --beam-size N [5 ] beam size for beam search
-wt N, --word-thold N [0.01 ] word timestamp probability threshold
-et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail
-lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail
-debug, --debug-mode [false ] enable debug mode (eg. dump log_mel)
-tr, --translate [false ] translate from source language to english
-di, --diarize [false ] stereo audio diarization
-tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model)
-nf, --no-fallback [false ] do not use temperature fallback while decoding
-otxt, --output-txt [false ] output result in a text file
-ovtt, --output-vtt [false ] output result in a vtt file
-osrt, --output-srt [false ] output result in a srt file
-olrc, --output-lrc [false ] output result in a lrc file
-owts, --output-words [false ] output script for generating karaoke video
-fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video
-ocsv, --output-csv [false ] output result in a CSV file
-oj, --output-json [false ] output result in a JSON file
-ojf, --output-json-full [false ] include more information in the JSON file
-of FNAME, --output-file FNAME [ ] output file path (without file extension)
-ps, --print-special [false ] print special tokens
-pc, --print-colors [false ] print colors
-pp, --print-progress [false ] print progress
-nt, --no-timestamps [false ] do not print timestamps
-l LANG, --language LANG [en ] spoken language ('auto' for auto-detect)
-dl, --detect-language [false ] exit after automatically detecting language
--prompt PROMPT [ ] initial prompt
-m FNAME, --model FNAME [models/ggml-base.en.bin] model path
-f FNAME, --file FNAME [ ] input WAV file path
-oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference
-ls, --log-score [false ] log best decoder scores of tokens
-ng, --no-gpu [false ] disable GPU

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA buffer size = 3094.49 MB
whisper_model_load: model size = 3093.99 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 220.20 MB
whisper_init_state: kv cross size = 245.76 MB
whisper_init_state: compute buffer (conv) = 30.98 MB
whisper_init_state: compute buffer (encode) = 212.42 MB
whisper_init_state: compute buffer (cross) = 9.38 MB
whisper_init_state: compute buffer (decode) = 99.23 MB

system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |

main: processing 'samples/01-03(Easy to Learn Chinese + Second Edition + Textbook 2).wav' (713721 samples, 44.6 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = zh, task = transcribe, timestamps = 1 ...

[IM2COL] NMSE = 0.000000 [GELU] NMSE = 0.000000 FAIL

[SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000037 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000007 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000007 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 FAIL

OK

[00:00:00.000 --> 00:00:04.800] 你爷爷奶奶住在哪儿?
[00:00:04.800 --> 00:00:07.800] 他们住在南京。
[00:00:07.800 --> 00:00:12.800] 他们跟我叔叔和婶婶一起住。
[00:00:12.800 --> 00:00:16.800] 你叔叔是哪一年结婚的?
[00:00:16.800 --> 00:00:19.800] 他是去年结婚的。
[00:00:19.800 --> 00:00:23.800] 你叔叔做什么工作?
[00:00:23.800 --> 00:00:25.800] 在哪儿工作?
[00:00:25.800 --> 00:00:27.800] 我叔叔是老师。
[IM2COL] NMSE = 0.000000 [GELU] NMSE = 0.000000 FAIL

^C⏎

I think it doesn't generate the exact same results as CPU even with this, but I am not sure if that's really an issue. We could disable tensor cores with cuBLAS to further increase the matrix multiplication precision, but the model shouldn't be so finicky.

@bobqianic
Copy link
Collaborator Author

bobqianic commented Dec 28, 2023

We could disable tensor cores with cuBLAS to further increase the matrix multiplication precision, but the model shouldn't be so finicky.

I believe it's unnecessary to proceed with that. My experiment indicates that as long as each operator maintains an NMSE under 0.0001, it effectively prevents any significant hallucinations.

I can provide another audio sample that's highly likely to induce hallucinations. Use flag -l en

micro-machine.zip

@bobqianic
Copy link
Collaborator Author

Another thing is very strange, how does OpenAI manage to use FP16 without encountering these precision issues?

@slaren
Copy link
Member

slaren commented Dec 28, 2023

This is the result with GGML_PREC_F32 with that sample:

whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v2.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51865 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 1280 whisper_model_load: n_audio_head = 20 whisper_model_load: n_audio_layer = 32 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 1280 whisper_model_load: n_text_head = 20 whisper_model_load: n_text_layer = 32 whisper_model_load: n_mels = 80 whisper_model_load: ftype = 1 whisper_model_load: qntvr = 0 whisper_model_load: type = 5 (large) whisper_model_load: adding 1608 extra tokens whisper_model_load: n_langs = 99 ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes ggml_init_cublas: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes whisper_backend_init: using CUDA backend whisper_model_load: CUDA buffer size = 3094.49 MB whisper_model_load: model size = 3093.99 MB whisper_backend_init: using CUDA backend whisper_init_state: kv self size = 220.20 MB whisper_init_state: kv cross size = 245.76 MB whisper_init_state: compute buffer (conv) = 30.98 MB whisper_init_state: compute buffer (encode) = 212.42 MB whisper_init_state: compute buffer (cross) = 9.38 MB whisper_init_state: compute buffer (decode) = 99.23 MB

system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |

main: processing 'samples/micro-machine.wav' (478214 samples, 29.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[IM2COL] NMSE = 0.000000 [GELU] NMSE = 0.000000 FAIL

[GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000000 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 FAIL

OK

[00:00:00.000 --> 00:00:03.200] This is the Micro Machine Man presenting the most midget miniature motorcade of Micro Machines.
[00:00:03.200 --> 00:00:06.600] Each one has dramatic details, terrific trim, precision paint jobs, plus incredible Micro Machine pocket play sets.
[00:00:06.600 --> 00:00:08.700] There's a police station, fire station, restaurant, service station, and more.
[00:00:08.700 --> 00:00:10.200] Perfect pocket portables to take anyplace.
[00:00:10.200 --> 00:00:15.200] And there are many miniature play sets to play with, and each one comes with its own special edition Micro Machine vehicle and fun, fantastic features that miraculously move.
[00:00:15.200 --> 00:00:19.200] Raise the boat lift at the airport, marina, man the gun turret at the army base, clean your car at the car wash, raise the toll bridge.
[00:00:19.200 --> 00:00:21.200] And these play sets fit together to form a Micro Machine world.
[00:00:21.200 --> 00:00:25.200] Micro Machine pocket play sets, so tremendously tiny, so perfectly precise, so dazzlingly detailed, you'll want to pocket them all.
[00:00:25.200 --> 00:00:27.700] Micro Machines are Micro Machine pocket play sets sold separately from Galoob.
[00:00:27.700 --> 00:00:29.700] The smaller they are, the better they are.

whisper_print_timings: load time = 955.11 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 18.07 ms
whisper_print_timings: sample time = 233.04 ms / 1091 runs ( 0.21 ms per run)
whisper_print_timings: encode time = 73844.17 ms / 1 runs (73844.17 ms per run)
whisper_print_timings: decode time = 20.42 ms / 1 runs ( 20.42 ms per run)
whisper_print_timings: batchd time = 3006.87 ms / 1088 runs ( 2.76 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 78085.74 ms

@ggerganov
Copy link
Member

I merged the fixes from ggml/669 via #1691

Wonder if this would also resolve the problems that have been reported with CUDA recently:

#1502

@bobqianic
Copy link
Collaborator Author

bobqianic commented Dec 29, 2023

I merged the fixes from ggml/669 via #1691

Wonder if this would also resolve the problems that have been reported with CUDA recently:

#1502

While it somewhat reduces NMSE, it still leads to considerable hallucination. (GGML_PREC_F32 was not applied)

Windows 11 RTX3060 mobile micro-machine MSVC 19.37.32822.0
Large-v2 (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000010 FAIL

Large-v2 (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000015 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000028 [ADD] NMSE = 0.000029 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000032 [ADD] NMSE = 0.000033 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000033 [ADD] NMSE = 0.000034 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000038 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000039 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000040 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000016 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000040 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000043 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000039 [ADD] NMSE = 0.000039 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000045 [ADD] NMSE = 0.000045 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000029 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000038 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000030 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000021 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000034 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000036 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000029 [ADD] NMSE = 0.000010 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000046 [ADD] NMSE = 0.000044 [ADD] NMSE = 0.000010 [NORM] NMSE = 0.000026 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [GELU] NMSE = 0.000236 [MUL_MAT] NMSE = 0.000332 [ADD] NMSE = 0.000332 [ADD] NMSE = 0.000097 [NORM] NMSE = 0.000282 [MUL] NMSE = 0.000216 [ADD] NMSE = 0.000215 [MUL_MAT] NMSE = 0.000147 [ADD] NMSE = 0.000148 [CPY] NMSE = 0.000148 [MUL_MAT] NMSE = 0.000174 [CPY] NMSE = 0.000174 [MUL_MAT] NMSE = 0.000126 [ADD] NMSE = 0.000122 [CPY] NMSE = 0.000122 [MUL_MAT] NMSE = 0.000298 [SCALE] NMSE = 0.000298 [SOFT_MAX] NMSE = 0.000411 [MUL_MAT] NMSE = 0.000079 [CPY] NMSE = 0.000079 [MUL_MAT] NMSE = 0.000206 [ADD] NMSE = 0.000201 [ADD] NMSE = 0.000097 [NORM] NMSE = 0.000263 [MUL] NMSE = 0.000361 [ADD] NMSE = 0.000317 [MUL_MAT] NMSE = 0.000083 [ADD] NMSE = 0.000081 [GELU] NMSE = 0.002366 [MUL_MAT] NMSE = 0.004288 [ADD] NMSE = 0.004291 [ADD] NMSE = 0.000860 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000015 FAIL

Large-v2 (whisper_build_graph_cross)

OK

Ubuntu 20.04.5 LTS RTX2080ti (1) micro-machine gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Large-v2 (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000010 FAIL

Large-v2 (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000028 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000029 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000031 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000032 [ADD] NMSE = 0.000033 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000039 [ADD] NMSE = 0.000040 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000027 [MUL] NMSE = 0.000030 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000045 [ADD] NMSE = 0.000046 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000027 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000048 [ADD] NMSE = 0.000049 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000032 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000058 [ADD] NMSE = 0.000060 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000033 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000057 [ADD] NMSE = 0.000058 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000028 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000067 [ADD] NMSE = 0.000069 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000026 [CPY] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000069 [ADD] NMSE = 0.000070 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000028 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000028 [ADD] NMSE = 0.000028 [CPY] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000078 [ADD] NMSE = 0.000079 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000027 [CPY] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000078 [ADD] NMSE = 0.000080 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000079 [ADD] NMSE = 0.000082 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000030 [CPY] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000079 [ADD] NMSE = 0.000082 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000030 [ADD] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000030 [CPY] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000050 [ADD] NMSE = 0.000048 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000034 [MUL] NMSE = 0.000036 [ADD] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000084 [ADD] NMSE = 0.000087 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000035 [MUL] NMSE = 0.000031 [ADD] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000034 [CPY] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000045 [ADD] NMSE = 0.000043 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000035 [MUL] NMSE = 0.000037 [ADD] NMSE = 0.000032 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000080 [ADD] NMSE = 0.000082 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000036 [MUL] NMSE = 0.000031 [ADD] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000034 [CPY] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000028 [CPY] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000043 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000037 [MUL] NMSE = 0.000039 [ADD] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000096 [ADD] NMSE = 0.000098 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000038 [MUL] NMSE = 0.000036 [ADD] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000038 [CPY] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000033 [CPY] NMSE = 0.000033 [MUL_MAT] NMSE = 0.000065 [ADD] NMSE = 0.000063 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000038 [MUL] NMSE = 0.000041 [ADD] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000091 [ADD] NMSE = 0.000093 [ADD] NMSE = 0.000005 [NORM] NMSE = 0.000041 [MUL] NMSE = 0.000038 [ADD] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000043 [ADD] NMSE = 0.000043 [CPY] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000061 [MUL_MAT] NMSE = 0.000043 [CPY] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000074 [ADD] NMSE = 0.000071 [ADD] NMSE = 0.000005 [NORM] NMSE = 0.000042 [MUL] NMSE = 0.000044 [ADD] NMSE = 0.000039 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [GELU] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000133 [ADD] NMSE = 0.000134 [ADD] NMSE = 0.000008 [NORM] NMSE = 0.000045 [MUL] NMSE = 0.000042 [ADD] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000044 [ADD] NMSE = 0.000044 [CPY] NMSE = 0.000044 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000012 [SCALE] NMSE = 0.000012 [SOFT_MAX] NMSE = 0.000076 [MUL_MAT] NMSE = 0.000038 [CPY] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000088 [ADD] NMSE = 0.000084 [ADD] NMSE = 0.000008 [NORM] NMSE = 0.000045 [MUL] NMSE = 0.000048 [ADD] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [GELU] NMSE = 0.000054 [MUL_MAT] NMSE = 0.000063 [ADD] NMSE = 0.000063 [ADD] NMSE = 0.000026 [NORM] NMSE = 0.000048 [MUL] NMSE = 0.000040 [ADD] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000039 [ADD] NMSE = 0.000040 [CPY] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000012 [SCALE] NMSE = 0.000012 [SOFT_MAX] NMSE = 0.000080 [MUL_MAT] NMSE = 0.000035 [CPY] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000091 [ADD] NMSE = 0.000088 [ADD] NMSE = 0.000027 [NORM] NMSE = 0.000051 [MUL] NMSE = 0.000061 [ADD] NMSE = 0.000054 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000017 [GELU] NMSE = 0.000037 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000016 [NORM] NMSE = 0.000043 [MUL] NMSE = 0.000041 [ADD] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000042 [CPY] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000011 [SCALE] NMSE = 0.000011 [SOFT_MAX] NMSE = 0.000092 [MUL_MAT] NMSE = 0.000036 [CPY] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000093 [ADD] NMSE = 0.000091 [ADD] NMSE = 0.000016 [NORM] NMSE = 0.000044 [MUL] NMSE = 0.000045 [ADD] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [GELU] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000039 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000033 FAIL

Large-v2 (whisper_build_graph_cross)

OK

Ubuntu 20.04.5 LTS RTX2080ti (2) micro-machine gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Large-v2 (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000010 FAIL

Large-v2 (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000028 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000012 [CPY] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000029 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000014 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000031 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000032 [ADD] NMSE = 0.000033 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000012 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000039 [ADD] NMSE = 0.000040 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000027 [MUL] NMSE = 0.000030 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000045 [ADD] NMSE = 0.000046 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000027 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000048 [ADD] NMSE = 0.000049 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000032 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000058 [ADD] NMSE = 0.000060 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000033 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000057 [ADD] NMSE = 0.000058 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000031 [ADD] NMSE = 0.000028 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000067 [ADD] NMSE = 0.000069 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000026 [CPY] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000069 [ADD] NMSE = 0.000070 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000028 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000028 [ADD] NMSE = 0.000028 [CPY] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000078 [ADD] NMSE = 0.000079 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000027 [CPY] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000009 [SCALE] NMSE = 0.000009 [SOFT_MAX] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000027 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000078 [ADD] NMSE = 0.000080 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000032 [MUL] NMSE = 0.000035 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000079 [ADD] NMSE = 0.000082 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000029 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000030 [CPY] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000031 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000079 [ADD] NMSE = 0.000082 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000033 [MUL] NMSE = 0.000030 [ADD] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000030 [ADD] NMSE = 0.000030 [CPY] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000050 [ADD] NMSE = 0.000048 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000034 [MUL] NMSE = 0.000036 [ADD] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000084 [ADD] NMSE = 0.000087 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000035 [MUL] NMSE = 0.000031 [ADD] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000034 [CPY] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000045 [ADD] NMSE = 0.000043 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000035 [MUL] NMSE = 0.000037 [ADD] NMSE = 0.000032 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000080 [ADD] NMSE = 0.000082 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000036 [MUL] NMSE = 0.000031 [ADD] NMSE = 0.000031 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000034 [CPY] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000028 [CPY] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000043 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000037 [MUL] NMSE = 0.000039 [ADD] NMSE = 0.000034 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000030 [MUL_MAT] NMSE = 0.000096 [ADD] NMSE = 0.000098 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000038 [MUL] NMSE = 0.000036 [ADD] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000038 [CPY] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000033 [CPY] NMSE = 0.000033 [MUL_MAT] NMSE = 0.000065 [ADD] NMSE = 0.000063 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000038 [MUL] NMSE = 0.000041 [ADD] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000091 [ADD] NMSE = 0.000093 [ADD] NMSE = 0.000005 [NORM] NMSE = 0.000041 [MUL] NMSE = 0.000038 [ADD] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000043 [ADD] NMSE = 0.000043 [CPY] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000061 [MUL_MAT] NMSE = 0.000043 [CPY] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000074 [ADD] NMSE = 0.000071 [ADD] NMSE = 0.000005 [NORM] NMSE = 0.000042 [MUL] NMSE = 0.000044 [ADD] NMSE = 0.000039 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [GELU] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000133 [ADD] NMSE = 0.000134 [ADD] NMSE = 0.000008 [NORM] NMSE = 0.000045 [MUL] NMSE = 0.000042 [ADD] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000044 [ADD] NMSE = 0.000044 [CPY] NMSE = 0.000044 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000012 [SCALE] NMSE = 0.000012 [SOFT_MAX] NMSE = 0.000076 [MUL_MAT] NMSE = 0.000038 [CPY] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000088 [ADD] NMSE = 0.000084 [ADD] NMSE = 0.000008 [NORM] NMSE = 0.000045 [MUL] NMSE = 0.000048 [ADD] NMSE = 0.000043 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [GELU] NMSE = 0.000054 [MUL_MAT] NMSE = 0.000063 [ADD] NMSE = 0.000063 [ADD] NMSE = 0.000026 [NORM] NMSE = 0.000048 [MUL] NMSE = 0.000040 [ADD] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000039 [ADD] NMSE = 0.000040 [CPY] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000012 [SCALE] NMSE = 0.000012 [SOFT_MAX] NMSE = 0.000080 [MUL_MAT] NMSE = 0.000035 [CPY] NMSE = 0.000035 [MUL_MAT] NMSE = 0.000091 [ADD] NMSE = 0.000088 [ADD] NMSE = 0.000027 [NORM] NMSE = 0.000051 [MUL] NMSE = 0.000061 [ADD] NMSE = 0.000054 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000017 [GELU] NMSE = 0.000037 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000016 [NORM] NMSE = 0.000043 [MUL] NMSE = 0.000041 [ADD] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000042 [CPY] NMSE = 0.000042 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000020 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000011 [SCALE] NMSE = 0.000011 [SOFT_MAX] NMSE = 0.000092 [MUL_MAT] NMSE = 0.000036 [CPY] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000093 [ADD] NMSE = 0.000091 [ADD] NMSE = 0.000016 [NORM] NMSE = 0.000044 [MUL] NMSE = 0.000045 [ADD] NMSE = 0.000040 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [GELU] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000039 [MUL] NMSE = 0.000034 [ADD] NMSE = 0.000033 FAIL

Large-v2 (whisper_build_graph_cross)

OK

Ubuntu 20.04.5 LTS RTX3060 micro-machine gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Large-v2 (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000003 FAIL

Large-v2 (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000019 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000024 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000028 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000030 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000036 [ADD] NMSE = 0.000037 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000035 [ADD] NMSE = 0.000036 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000044 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000044 [ADD] NMSE = 0.000045 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000049 [ADD] NMSE = 0.000050 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000021 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000052 [ADD] NMSE = 0.000053 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000020 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000054 [ADD] NMSE = 0.000055 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000019 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000022 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000054 [ADD] NMSE = 0.000055 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000019 [ADD] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000028 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000057 [ADD] NMSE = 0.000059 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000025 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000052 [ADD] NMSE = 0.000053 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000022 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000061 [ADD] NMSE = 0.000062 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000024 [ADD] NMSE = 0.000024 [CPY] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000035 [ADD] NMSE = 0.000034 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000052 [ADD] NMSE = 0.000053 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000026 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000026 [CPY] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000036 [MUL_MAT] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000038 [ADD] NMSE = 0.000037 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000026 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000051 [ADD] NMSE = 0.000051 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000030 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000027 [CPY] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000039 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000047 [ADD] NMSE = 0.000045 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000030 [MUL] NMSE = 0.000032 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000041 [MUL_MAT] NMSE = 0.000078 [ADD] NMSE = 0.000078 [ADD] NMSE = 0.000024 [NORM] NMSE = 0.000027 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000006 [SCALE] NMSE = 0.000006 [SOFT_MAX] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000047 [ADD] NMSE = 0.000046 [ADD] NMSE = 0.000024 [NORM] NMSE = 0.000027 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000012 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000045 [MUL_MAT] NMSE = 0.000019 [CPY] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000051 [ADD] NMSE = 0.000050 [ADD] NMSE = 0.000013 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000006 [NORM] NMSE = 0.000022 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 FAIL

Large-v2 (whisper_build_graph_cross)

OK

Ubuntu 20.04.5 LTS RTX3080ti micro-machine gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Large-v2 (whisper_build_graph_conv)

[MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [IM2COL] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 FAIL

Large-v2 (whisper_build_graph_encoder)

[MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000017 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000008 [MUL] NMSE = 0.000009 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000017 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000009 [MUL] NMSE = 0.000006 [ADD] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000006 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000010 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000019 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000007 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [SCALE] NMSE = 0.000002 [SOFT_MAX] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000011 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000024 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000012 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000011 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000013 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000028 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000010 [ADD] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000014 [MUL] NMSE = 0.000015 [ADD] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000029 [ADD] NMSE = 0.000030 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000015 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000036 [ADD] NMSE = 0.000037 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000013 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000016 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000034 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000012 [ADD] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000013 [ADD] NMSE = 0.000013 [CPY] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000015 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000017 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [GELU] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000042 [ADD] NMSE = 0.000044 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000014 [ADD] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000007 [CPY] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000012 [MUL_MAT] NMSE = 0.000044 [ADD] NMSE = 0.000044 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000018 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000013 [MUL_MAT] NMSE = 0.000050 [ADD] NMSE = 0.000051 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000017 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [CPY] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000014 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000019 [MUL] NMSE = 0.000021 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000052 [ADD] NMSE = 0.000053 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000016 [ADD] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000015 [ADD] NMSE = 0.000015 [CPY] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000020 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000021 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000054 [ADD] NMSE = 0.000055 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000018 [ADD] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000019 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000020 [MUL] NMSE = 0.000021 [ADD] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000016 [MUL_MAT] NMSE = 0.000054 [ADD] NMSE = 0.000056 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000019 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000018 [ADD] NMSE = 0.000018 [CPY] NMSE = 0.000018 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000026 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000021 [MUL] NMSE = 0.000023 [ADD] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000058 [ADD] NMSE = 0.000059 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000021 [ADD] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000014 [CPY] NMSE = 0.000014 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000025 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [GELU] NMSE = 0.000019 [MUL_MAT] NMSE = 0.000056 [ADD] NMSE = 0.000057 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000022 [ADD] NMSE = 0.000022 [CPY] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000006 [CPY] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000017 [CPY] NMSE = 0.000017 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000025 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000024 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000066 [ADD] NMSE = 0.000068 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000024 [ADD] NMSE = 0.000024 [CPY] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000044 [ADD] NMSE = 0.000042 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [GELU] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000057 [ADD] NMSE = 0.000058 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000026 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000026 [ADD] NMSE = 0.000026 [CPY] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000038 [MUL_MAT] NMSE = 0.000024 [CPY] NMSE = 0.000024 [MUL_MAT] NMSE = 0.000043 [ADD] NMSE = 0.000041 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000026 [MUL] NMSE = 0.000029 [ADD] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000060 [ADD] NMSE = 0.000060 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000027 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000027 [ADD] NMSE = 0.000027 [CPY] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000010 [ADD] NMSE = 0.000009 [CPY] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000041 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000048 [ADD] NMSE = 0.000045 [ADD] NMSE = 0.000004 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000031 [ADD] NMSE = 0.000028 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000033 [MUL_MAT] NMSE = 0.000035 [ADD] NMSE = 0.000035 [ADD] NMSE = 0.000014 [NORM] NMSE = 0.000028 [MUL] NMSE = 0.000026 [ADD] NMSE = 0.000026 [MUL_MAT] NMSE = 0.000025 [ADD] NMSE = 0.000025 [CPY] NMSE = 0.000025 [MUL_MAT] NMSE = 0.000010 [CPY] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000012 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000007 [SCALE] NMSE = 0.000007 [SOFT_MAX] NMSE = 0.000049 [MUL_MAT] NMSE = 0.000020 [CPY] NMSE = 0.000020 [MUL_MAT] NMSE = 0.000051 [ADD] NMSE = 0.000050 [ADD] NMSE = 0.000014 [NORM] NMSE = 0.000029 [MUL] NMSE = 0.000031 [ADD] NMSE = 0.000027 [MUL_MAT] NMSE = 0.000008 [ADD] NMSE = 0.000007 [GELU] NMSE = 0.000041 [MUL_MAT] NMSE = 0.000046 [ADD] NMSE = 0.000046 [ADD] NMSE = 0.000020 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000024 [ADD] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000023 [ADD] NMSE = 0.000023 [CPY] NMSE = 0.000023 [MUL_MAT] NMSE = 0.000008 [CPY] NMSE = 0.000008 [MUL_MAT] NMSE = 0.000011 [ADD] NMSE = 0.000011 [CPY] NMSE = 0.000011 [MUL_MAT] NMSE = 0.000005 [SCALE] NMSE = 0.000005 [SOFT_MAX] NMSE = 0.000052 [MUL_MAT] NMSE = 0.000021 [CPY] NMSE = 0.000021 [MUL_MAT] NMSE = 0.000056 [ADD] NMSE = 0.000054 [ADD] NMSE = 0.000020 [NORM] NMSE = 0.000025 [MUL] NMSE = 0.000025 [ADD] NMSE = 0.000022 [MUL_MAT] NMSE = 0.000006 [ADD] NMSE = 0.000006 [GELU] NMSE = 0.000015 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000023 [MUL] NMSE = 0.000020 [ADD] NMSE = 0.000019 FAIL

Large-v2 (whisper_build_graph_cross)

OK

@bobqianic
Copy link
Collaborator Author

bobqianic commented Jan 5, 2024

I did some additional tests today and found that even with all the same environment, different hardware produces different results (same system, same drivers, same toolkit version, same compiler, etc). However, when I used master in a Linux environment, there was no hallucination observed (tested on RTX3060 and RTX2080ti), it only occurred under Windows, even though the result is different in Linux. @slaren

Linux and Windows both use the same version 11.8.89 of the CUDA toolkit.

Note that by different hardware, I mean different hardware models. For the same hardware model but different units of hardware (such as two 2080ti cards), the result is the same.

root@imperial-container-46f14c9572-1b065a23:~/whisper.cpp-funny# cmake -S . -B ./build -DWHISPER_CUBLAS=1
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.25.1") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDAToolkit: /usr/local/cuda/include (found version "11.8.89") 
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- GGML CUDA sources found, configuring CUDA architecture
-- GGML Configuring CUDA architectures 52;61;70
-- Configuring done (3.7s)
-- Generating done (0.0s)
-- Build files have been written to: /root/whisper.cpp-funny/build
C:\Users\qianp\Downloads>cmake -S C:\Users\qianp\Downloads\whisper.cpp-funny -B C:\Users\qianp\Downloads\whisper.cpp_build-funny -DWHISPER_CUBLAS=1
-- Building for: Visual Studio 17 2022
-- The C compiler identification is MSVC 19.37.32822.0
-- The CXX compiler identification is MSVC 19.37.32822.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.37.32822/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.42.0.windows.2")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE
-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/include (found version "11.8.89")
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.8/bin/nvcc.exe - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- x86 detected
-- GGML CUDA sources found, configuring CUDA architecture
-- GGML Configuring CUDA architectures 52;61;70
-- Configuring done (11.7s)
-- Generating done (0.1s)
-- Build files have been written to: C:/Users/qianp/Downloads/whisper.cpp_build-funny
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 2080 Ti     Off | 00000000:26:00.0 Off |                  N/A |
| 30%   34C    P8              22W / 250W |     15MiB / 22528MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060        On  | 00000000:0E:00.0 Off |                  N/A |
| 30%   27C    P8              14W / 170W |      1MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti     On  | 00000000:B1:00.0 Off |                  N/A |
| 42%   37C    P8              31W / 350W |      2MiB / 12288MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

@bobqianic
Copy link
Collaborator Author

bobqianic commented Jan 5, 2024

https://github.com/ggerganov/whisper.cpp/blob/ba5bcde874b6650cb13f8c432260cfe7a6a34b80/whisper.cpp#L4894-L4906

We are currently using a random number generator in sampling, which selects several tokens randomly based on the vocab probability output by the model. There are several possibilities. One is that my graphics card in my laptop is broken, so there is a problem with the computation, leading to incorrect results. Another possibility is that the graphics card is not broken, but due to different compilers and different environments, the results generated by the random number generator are different. Another possibility is that everything else is fine, but there is a BUG in the GGML underlying layer. The last possibility is that there is a BUG in CUDA.

1. My graphics card in my laptop is broken ❌
2. Results generated by the random number generator are different❓
3. BUG in the GGML underlying layer✅
4. BUG in CUDA

One is that my graphics card in my laptop is broken

This is not correct, as hallucinations continue to occur on other Windows machines.

NVIDIA A16-8Q

image

BUG in the GGML underlying layer

This is possible.

See #1692 (comment)

@ggerganov
Copy link
Member

Try the following 2 things on the problematic environment using the latest master:

  • run with the -nf flag: main -m ./models/ggml-large-v2.bin -f ./samples/mm0.wav -nf
  • uncomment WHISPER_DEBUG in whisper.cpp and run the same test

https://github.com/ggerganov/whisper.cpp/blob/ba5bcde874b6650cb13f8c432260cfe7a6a34b80/whisper.cpp#L129-L131

Post the output that you get

@bobqianic
Copy link
Collaborator Author

run with the -nf flag: main -m ./models/ggml-large-v2.bin -f ./samples/mm0.wav -nf

Most of the time, the output is incorrect, but occasionally, I receive the correct output. Here are some examples of incorrect outputs.

C:\Users\qianp\Downloads\whisper.cpp_build-master>C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\main.exe -m C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\ggml-model-whisper-large.bin -f C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\micro-machine.wav -nf
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\ggml-model-whisper-large.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  3094.49 MB
whisper_model_load: model size    = 3093.99 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: compute buffer (conv)   =   30.98 MB
whisper_init_state: compute buffer (encode) =  212.42 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =   99.23 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |

main: processing 'C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\micro-machine.wav' (478214 samples, 29.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:02.060]   you
[00:00:02.060 --> 00:00:12.060]   [BLANK_AUDIO]
[00:00:12.060 --> 00:00:22.060]   [BLANK_AUDIO]
[00:00:22.060 --> 00:00:32.060]   [BLANK_AUDIO]


whisper_print_timings:     load time =  4020.70 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    22.55 ms
whisper_print_timings:   sample time =    73.76 ms /   160 runs (    0.46 ms per run)
whisper_print_timings:   encode time =  1646.32 ms /     4 runs (  411.58 ms per run)
whisper_print_timings:   decode time =    50.69 ms /     3 runs (   16.90 ms per run)
whisper_print_timings:   batchd time =   727.76 ms /   147 runs (    4.95 ms per run)
whisper_print_timings:   prompt time =    46.79 ms /    44 runs (    1.06 ms per run)
whisper_print_timings:    total time =  6600.93 ms
C:\Users\qianp\Downloads\whisper.cpp_build-master>C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\main.exe -m C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\ggml-model-whisper-large.bin -f C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\micro-machine.wav -nf
whisper_init_from_file_with_params_no_state: loading model from 'C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\ggml-model-whisper-large.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3060 Laptop GPU, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  3094.49 MB
whisper_model_load: model size    = 3093.99 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: compute buffer (conv)   =   30.98 MB
whisper_init_state: compute buffer (encode) =  212.42 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =   99.23 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |

main: processing 'C:\Users\qianp\Downloads\whisper.cpp_build-master\bin\Release\micro-machine.wav' (478214 samples, 29.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:02.060]   you
[00:00:02.060 --> 00:00:29.340]   want to see it.


whisper_print_timings:     load time =  4023.30 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    20.60 ms
whisper_print_timings:   sample time =    38.43 ms /    82 runs (    0.47 ms per run)
whisper_print_timings:   encode time =   826.93 ms /     2 runs (  413.46 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =   430.69 ms /    82 runs (    5.25 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  5350.96 ms

uncomment in and run the same test WHISPER_DEBUG whisper.cpp

DEBUG.zip

@bobqianic
Copy link
Collaborator Author

bobqianic commented Jan 5, 2024

Do you need a test environment? I can set up RDP access for you.

Edit: I have already sent you the email. Search title Windows RDP access

@ggerganov
Copy link
Member

Could be related to: ggml-org/ggml#679

Can you check if the following patch fixes the issue:

diff --git a/ggml-cuda.cu b/ggml-cuda.cu
index 10c2161..2a84ffa 100644
--- a/ggml-cuda.cu
+++ b/ggml-cuda.cu
@@ -9691,6 +9691,7 @@ static void ggml_backend_cuda_buffer_set_tensor(ggml_backend_buffer_t buffer, gg
     CUDA_CHECK(cudaDeviceSynchronize());
 
     CUDA_CHECK(cudaMemcpy((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice));
+    CUDA_CHECK(cudaDeviceSynchronize());
 }
 
 static void ggml_backend_cuda_buffer_get_tensor(ggml_backend_buffer_t buffer, const ggml_tensor * tensor, void * data, size_t offset, size_t size) {

I won't be able to RDP - will be too difficult for me to navigate in Windows, so I doubt it will be useful

@bobqianic
Copy link
Collaborator Author

Can you check if the following patch fixes the issue:

Thanks! I've applied the patch and the results are promising. Despite the NMSE error persisting as before, there are no instances of hallucination anymore. I conducted 20 tests and none exhibited hallucination, which is a significant improvement considering the usual rate was around 70%.

@ggerganov
Copy link
Member

I think given the bad results and the various issue reports with CUDA, it would make sense to push this change now and make a new whisper.cpp release and not wait for the llama.cpp sync. @slaren any concerns?

@slaren
Copy link
Member

slaren commented Jan 5, 2024

Agree, I didn't think the synchronization issue would cause issues in models with small inputs, but if it is causing issues with whisper.cpp it should be fixed now.

@ggerganov
Copy link
Member

Synced 11b1b63

@bobqianic Please confirm that master is OK on your end and I'll push a new release

@bobqianic
Copy link
Collaborator Author

Synced 11b1b63

@bobqianic Please confirm that master is OK on your end and I'll push a new release

master works on my laptop without hallucination.

@bobqianic bobqianic linked an issue Jan 6, 2024 that may be closed by this pull request
@bobqianic
Copy link
Collaborator Author

Thank you all! Since the hallucination issue has been resolved, I'm now closing this PR.

@bobqianic bobqianic closed this Jan 6, 2024
@bobqianic bobqianic deleted the funny branch January 13, 2024 23:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Encoder is broken when CUBLAS is ON
3 participants