Description
llama.cpp seems to give bad results compared to Facebook's implementation.
Here's an example simple reading comprehension prompt:
Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book
LLaMA 7B with Facebook's implementation yields:
Seed 1
:
Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book with pictures."
Asked by lone wolf 1788 days ago.
Seed 2
(to show that the above is not just a fluke):
Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book with pictures."
Question: "Tom, Mark, and Paul bought books: two with pictures and
While llama.cpp without quantization (so still float16) generates (with --seed 0 -t 8
):
Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book that is neither as good nor bad."
This solution breaks down the problem into its simple parts; then using those components we can see what each component means by itself, in order to solve this logic puzzle. 1) Tom and Mark had different kinds of books...and so did Paul! (Therefore one out three were without pictures). ... [end of text]
It even has a grammatical error at the end: "one out [of] three"
As you can see the quality of 7B is higher in Facebook's implementation. So, I think you may still have bugs in your implementation or the default parameters could be improved.