Can not infer quantized model, but fp32 works well. #20

znsoftm · 2023-07-12T07:43:16Z

run main.exe with a q4_0 quantized model.

for q4_0, it complains assert in ggml.c

in function static void ggml_compute_forward_soft_max_f32(

line 9342: assert(sum > 0.0); sum is -nan (ind)

znsoftm · 2023-07-12T07:50:35Z

overflowed?

znsoftm · 2023-07-12T07:52:28Z

But q4_1 works well.

skeskinen · 2023-07-12T08:45:54Z

nan results are typically a sign of some float accuracy weirdness. Do you have a very small model? I think the quantization is less accurate the smaller your model is.

znsoftm · 2023-07-12T09:27:22Z

get a quntized model from this model: multi-qa-MiniLM-L6-cos-v1 on hugging face.

znsoftm · 2023-07-12T09:28:34Z

I modify the code to adapt to BertCode with the latest ggml, it works fine. Maybe it can be solve by upgrading GGML?

znsoftm · 2023-07-12T10:10:20Z

nan results are typically a sign of some float accuracy weirdness. Do you have a very small model? I think the quantization is less accurate the smaller your model is.
ggml-model-q4_0.zip

appvoid · 2023-07-17T03:03:42Z

run main.exe with a q4_0 quantized model.

for q4_0, it complains assert in ggml.c

in function static void ggml_compute_forward_soft_max_f32(

line 9342: assert(sum > 0.0); sum is -nan (ind)

Can you please tell me how did you manage to make it work on Windows???

znsoftm · 2023-07-17T11:14:11Z

I have pulled a request and the repo owner has merged it. Git pull to get a new version, it works on Windows.

appvoid · 2023-07-17T12:18:23Z

Thanks, it's working now!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not infer quantized model, but fp32 works well. #20

Can not infer quantized model, but fp32 works well. #20

znsoftm commented Jul 12, 2023 •

edited

Loading

znsoftm commented Jul 12, 2023

znsoftm commented Jul 12, 2023 •

edited

Loading

skeskinen commented Jul 12, 2023

znsoftm commented Jul 12, 2023

znsoftm commented Jul 12, 2023

znsoftm commented Jul 12, 2023

appvoid commented Jul 17, 2023

znsoftm commented Jul 17, 2023 •

edited

Loading

appvoid commented Jul 17, 2023

Can not infer quantized model, but fp32 works well. #20

Can not infer quantized model, but fp32 works well. #20

Comments

znsoftm commented Jul 12, 2023 • edited Loading

znsoftm commented Jul 12, 2023

znsoftm commented Jul 12, 2023 • edited Loading

skeskinen commented Jul 12, 2023

znsoftm commented Jul 12, 2023

znsoftm commented Jul 12, 2023

znsoftm commented Jul 12, 2023

appvoid commented Jul 17, 2023

znsoftm commented Jul 17, 2023 • edited Loading

appvoid commented Jul 17, 2023

znsoftm commented Jul 12, 2023 •

edited

Loading

znsoftm commented Jul 12, 2023 •

edited

Loading

znsoftm commented Jul 17, 2023 •

edited

Loading