Skip to content

Commit

Permalink
Update with Sonnet 3.5 and Gemini 1.5 Pro results
Browse files Browse the repository at this point in the history
  • Loading branch information
carlini committed Jun 23, 2024
1 parent dfea228 commit d0ecd8c
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,13 +49,15 @@ This is helpful for determining whether or not models are capable of performing
## Results

I've evaluated a few models on this benchmark. Here's how they perform:
* GPT-4o: 49% passed
* Claude 3 Opus: 43% passed
* Claude 3 Sonnet: 33% passed
* Mistral Large: 29% passed
* GPT-3.5: 27% passed
* Mistral Medium: 24% passed
* Gemini Pro 1.0: 18% passed
* Claude 3.5 Sonnet: 48% passed
* GPT 4o: 47% passed
* Claude 3 Opus: 42% passed
* Claude 3 Sonnet: 32% passed
* Gemini 1.5 Pro: 32% passed
* Mistral Large: 28% passed
* GPT 3.5: 26% passed
* Mistral Medium: 23% passed
* Gemini 1.0 Pro: 17% passed

A complete evaluation grid is available [here](https://nicholas.carlini.com/writing/2024/evaluation_examples/index.html).

Expand Down

0 comments on commit d0ecd8c

Please sign in to comment.