-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not infer quantized model, but fp32 works well. #20
Comments
overflowed? |
But q4_1 works well. |
nan results are typically a sign of some float accuracy weirdness. Do you have a very small model? I think the quantization is less accurate the smaller your model is. |
get a quntized model from this model: multi-qa-MiniLM-L6-cos-v1 on hugging face. |
I modify the code to adapt to BertCode with the latest ggml, it works fine. Maybe it can be solve by upgrading GGML? |
|
Can you please tell me how did you manage to make it work on Windows??? |
I have pulled a request and the repo owner has merged it. Git pull to get a new version, it works on Windows. |
Thanks, it's working now! |
run main.exe with a q4_0 quantized model.
for q4_0, it complains assert in ggml.c
in function static void ggml_compute_forward_soft_max_f32(
line 9342: assert(sum > 0.0); sum is -nan (ind)
The text was updated successfully, but these errors were encountered: