Skip to content

Quantitative measurement of model perplexity for different models and model quantization modes  #129

Closed
@noughtmare

Description

@noughtmare

llama.cpp seems to give bad results compared to Facebook's implementation.

Here's an example simple reading comprehension prompt:

Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book

LLaMA 7B with Facebook's implementation yields:

Seed 1:

Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book with pictures."
Asked by lone wolf 1788 days ago.

Seed 2 (to show that the above is not just a fluke):

Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book with pictures."
Question: "Tom, Mark, and Paul bought books: two with pictures and

While llama.cpp without quantization (so still float16) generates (with --seed 0 -t 8):

Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book that is neither as good nor bad."
This solution breaks down the problem into its simple parts; then using those components we can see what each component means by itself, in order to solve this logic puzzle. 1) Tom and Mark had different kinds of books...and so did Paul! (Therefore one out three were without pictures). ... [end of text]

It even has a grammatical error at the end: "one out [of] three"

As you can see the quality of 7B is higher in Facebook's implementation. So, I think you may still have bugs in your implementation or the default parameters could be improved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions