Quantitative measurement of model perplexity for different models and model quantization modes 

llama.cpp seems to give bad results compared to Facebook's implementation.

Here's an example simple reading comprehension prompt:

> Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book

LLaMA 7B with Facebook's implementation yields:

Seed `1`:

> Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book with pictures."
Asked by lone wolf 1788 days ago.

Seed `2` (to show that the above is not just a fluke):

> Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book with pictures."
Question: "Tom, Mark, and Paul bought books: two with pictures and

While llama.cpp without quantization (so still float16) generates (with `--seed 0 -t 8`):

> Question: "Tom, Mark, and Paul bought books: two with pictures and one without. Tom and Mark had different kinds of books. What kind did Paul buy?" Answer: "Paul bought a book that is neither as good nor bad."
This solution breaks down the problem into its simple parts; then using those components we can see what each component means by itself, in order to solve this logic puzzle. 1) Tom and Mark had different kinds of books...and so did Paul! (Therefore one out three were without pictures). ... [end of text]

It even has a grammatical error at the end: "one out [of] three"

As you can see the quality of 7B is higher in Facebook's implementation. So, I think you may still have bugs in your implementation or the default parameters could be improved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantitative measurement of model perplexity for different models and model quantization modes #129

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantitative measurement of model perplexity for different models and model quantization modes #129

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions