Description
Try this query: "What is 3333+777?"
Yes, yes, LLMs are bad at math. That's not what I'm getting at. Someone mentioned this on Reddit, and I have to agree that I'm seeing weird stuff too.
Let's get a baseline. Here is what meta.ai yields:
This is likely running on Llama 3 70B.
Here is what Groq yields:
and at 8B:
Now, here's where things get weird. Using Open WebUI on top of Ollama, let's use llama.cpp to run the GGUFs of Llama 3.
First, 8B at fp16:
Then 8B at Q8_0:
Then 70B at Q4_0:
I think the problem should be clear. All of the non-llama.cpp instances that were not using GGUFs did the math problem correctly. All of the llama.cpp instances got the problem wrong in exactly the same way. This issue is extremely repeatable on both ends. I have never seen the cloud instances make this mistake, and I have never seen the llama.cpp instances not make this exact mistake of adding an extra digit to the problem and then getting it wrong.
To me, it appears that something is degrading the accuracy of Llama 3 when run under llama.cpp.
Any ideas of what's going wrong here?