-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Testing : Compare CPU backend with GPU backend #1692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
#1691 Doesn't work for me |
CPU_backend vs CUDA_backendTinyexpand
Baseexpand
Smallexpand
Mediumexpand
Large-v2expand
|
I have been testing this, and I think this due to precision differences with the matrix multiplication.
$ WHISPER_CUBLAS=1 make main && ./main -m models/ggml-large-v2.bin samples/01-03\(Easy\ to\ Learn\ Chinese\ +\ Second\ Edition\ +\ Textbook\ 2\).wav -l zh
I whisper.cpp build info: nvcc --forward-unknown-to-host-compiler -arch=native -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o usage: ./main [options] file0.wav file1.wav ... options: whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v2.bin' system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | main: processing 'samples/01-03(Easy to Learn Chinese + Second Edition + Textbook 2).wav' (713721 samples, 44.6 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = zh, task = transcribe, timestamps = 1 ... [IM2COL] NMSE = 0.000000 [GELU] NMSE = 0.000000 FAIL [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [SCALE] NMSE = 0.000001 [SOFT_MAX] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000002 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000037 [ADD] NMSE = 0.000037 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000002 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000003 [SCALE] NMSE = 0.000003 [SOFT_MAX] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [ADD] NMSE = 0.000002 [NORM] NMSE = 0.000003 [MUL] NMSE = 0.000003 [ADD] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000010 [MUL_MAT] NMSE = 0.000016 [ADD] NMSE = 0.000016 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000006 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000004 [ADD] NMSE = 0.000004 [CPY] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000008 [SCALE] NMSE = 0.000008 [SOFT_MAX] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000007 [ADD] NMSE = 0.000007 [ADD] NMSE = 0.000009 [NORM] NMSE = 0.000007 [MUL] NMSE = 0.000008 [ADD] NMSE = 0.000007 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [GELU] NMSE = 0.000009 [MUL_MAT] NMSE = 0.000009 [ADD] NMSE = 0.000009 [ADD] NMSE = 0.000007 [NORM] NMSE = 0.000004 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [CPY] NMSE = 0.000005 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [CPY] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000004 [SCALE] NMSE = 0.000004 [SOFT_MAX] NMSE = 0.000006 [MUL_MAT] NMSE = 0.000003 [CPY] NMSE = 0.000003 [MUL_MAT] NMSE = 0.000005 [ADD] NMSE = 0.000005 [ADD] NMSE = 0.000007 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000005 [ADD] NMSE = 0.000004 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [ADD] NMSE = 0.000003 [NORM] NMSE = 0.000005 [MUL] NMSE = 0.000004 [ADD] NMSE = 0.000004 FAIL OK [00:00:00.000 --> 00:00:04.800] 你爷爷奶奶住在哪儿? ^C⏎ I think it doesn't generate the exact same results as CPU even with this, but I am not sure if that's really an issue. We could disable tensor cores with cuBLAS to further increase the matrix multiplication precision, but the model shouldn't be so finicky. |
I believe it's unnecessary to proceed with that. My experiment indicates that as long as each operator maintains an I can provide another audio sample that's highly likely to induce hallucinations. Use flag |
Another thing is very strange, how does OpenAI manage to use FP16 without encountering these precision issues? |
This is the result with
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large-v2.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 5 (large)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090 Ti, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA buffer size = 3094.49 MB
whisper_model_load: model size = 3093.99 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 220.20 MB
whisper_init_state: kv cross size = 245.76 MB
whisper_init_state: compute buffer (conv) = 30.98 MB
whisper_init_state: compute buffer (encode) = 212.42 MB
whisper_init_state: compute buffer (cross) = 9.38 MB
whisper_init_state: compute buffer (decode) = 99.23 MB
system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | main: processing 'samples/micro-machine.wav' (478214 samples, 29.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ... [IM2COL] NMSE = 0.000000 [GELU] NMSE = 0.000000 FAIL [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000000 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000000 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000003 [ADD] NMSE = 0.000003 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [GELU] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000001 [ADD] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [ADD] NMSE = 0.000000 [CPY] NMSE = 0.000000 [MUL_MAT] NMSE = 0.000000 [SCALE] NMSE = 0.000000 [SOFT_MAX] NMSE = 0.000002 [MUL_MAT] NMSE = 0.000001 [CPY] NMSE = 0.000001 [MUL_MAT] NMSE = 0.000002 [ADD] NMSE = 0.000002 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000001 [ADD] NMSE = 0.000001 [GELU] NMSE = 0.000000 [ADD] NMSE = 0.000001 [NORM] NMSE = 0.000001 [MUL] NMSE = 0.000000 [ADD] NMSE = 0.000000 FAIL OK [00:00:00.000 --> 00:00:03.200] This is the Micro Machine Man presenting the most midget miniature motorcade of Micro Machines. whisper_print_timings: load time = 955.11 ms |
While it somewhat reduces NMSE, it still leads to considerable hallucination. (
|
I did some additional tests today and found that even with all the same environment, different hardware produces different results (same system, same drivers, same toolkit version, same compiler, etc). However, when I used master in a Linux environment, there was no hallucination observed (tested on RTX3060 and RTX2080ti), it only occurred under Windows, even though the result is different in Linux. @slaren Linux and Windows both use the same version 11.8.89 of the CUDA toolkit. Note that by different hardware, I mean different hardware models. For the same hardware model but different units of hardware (such as two 2080ti cards), the result is the same.
|
We are currently using a random number generator in sampling, which selects several tokens randomly based on the vocab probability output by the model. There are several possibilities. One is that my graphics card in my laptop is broken, so there is a problem with the computation, leading to incorrect results. Another possibility is that the graphics card is not broken, but due to different compilers and different environments, the results generated by the random number generator are different. Another possibility is that everything else is fine, but there is a BUG in the GGML underlying layer. The last possibility is that there is a BUG in CUDA. 1. My graphics card in my laptop is broken ❌
This is not correct, as hallucinations continue to occur on other Windows machines.
This is possible. See #1692 (comment) |
Try the following 2 things on the problematic environment using the latest
Post the output that you get |
Most of the time, the output is incorrect, but occasionally, I receive the correct output. Here are some examples of incorrect outputs.
|
Do you need a test environment? I can set up RDP access for you. Edit: I have already sent you the email. Search title |
Could be related to: ggml-org/ggml#679 Can you check if the following patch fixes the issue: diff --git a/ggml-cuda.cu b/ggml-cuda.cu
index 10c2161..2a84ffa 100644
--- a/ggml-cuda.cu
+++ b/ggml-cuda.cu
@@ -9691,6 +9691,7 @@ static void ggml_backend_cuda_buffer_set_tensor(ggml_backend_buffer_t buffer, gg
CUDA_CHECK(cudaDeviceSynchronize());
CUDA_CHECK(cudaMemcpy((char *)tensor->data + offset, data, size, cudaMemcpyHostToDevice));
+ CUDA_CHECK(cudaDeviceSynchronize());
}
static void ggml_backend_cuda_buffer_get_tensor(ggml_backend_buffer_t buffer, const ggml_tensor * tensor, void * data, size_t offset, size_t size) { I won't be able to RDP - will be too difficult for me to navigate in Windows, so I doubt it will be useful |
Thanks! I've applied the patch and the results are promising. Despite the NMSE error persisting as before, there are no instances of hallucination anymore. I conducted 20 tests and none exhibited hallucination, which is a significant improvement considering the usual rate was around 70%. |
I think given the bad results and the various issue reports with CUDA, it would make sense to push this change now and make a new |
Agree, I didn't think the synchronization issue would cause issues in models with small inputs, but if it is causing issues with whisper.cpp it should be fixed now. |
Synced 11b1b63 @bobqianic Please confirm that |
|
Thank you all! Since the hallucination issue has been resolved, I'm now closing this PR. |
Phenomenon: When utilizing the CUDA backend, the transcription tends to produce hallucinations. See #1688 for more details.
Audio for error reproduction: 01-03(Easy to Learn Chinese + Second Edition + Textbook 2).zip
How to Reproduce the Error:
CMake
, ensuring to include the-DWHISPER_CUBLAS=1
option in the process.whisper-large-v2
model, and the-l zh
flag.step 4
, but this time add the-ng
flag. You'll see that the hallucinations no longer occur.This PR introduces code that compares the outputs of different backends with the CPU backend for each tensor operation. It executes the encoder once and does not produce any transcriptions.