docs: add DeepSeek V3.1 benchmark results to leaderboard #4464

gcp · 2025-08-19T17:49:35Z

Actual cost was only 0.60$, i.e. I was only charged this much in my account.

Don't ask me why it needs to be defined also with deepseek/ in order to apply.

gcp · 2025-08-19T19:27:00Z

Updated with extended context window, though given that there's still exactly one exhaustion the better score is likely just a bit more luck with sampling.

xz-keg · 2025-08-20T07:34:54Z

It seems that deepseek v3.1 is a mixed thinking model and it is mainly a reasoner. Maybe you shall also test deepseek-reasoner.

gcp · 2025-08-20T09:38:57Z

Maybe you shall also test deepseek-reasoner.

I already ran this test but after 50 or so cases the score was also ~70%. It's not clear to me that API endpoint is updated.

alouisy · 2025-08-21T06:45:20Z

They just announced 10mins ago on X the new models in there API maybe I was to soon

gcp · 2025-08-21T13:11:50Z

...and they also announced a significant price hike for deepseek-chat, and the discontinuation of nighttime discount. That's quite sad.

Kreijstal · 2025-10-03T07:18:08Z

what are the results for 3.2 reasoner?

gcp · 2025-10-03T11:09:34Z

what are the results for 3.2 reasoner?

74.2% on the official API, and 70.2% for the chat model

See #4551

gcp added 3 commits August 19, 2025 19:49

docs: add deepseek-chat benchmark results to leaderboard

9d96256

Add model metadata for extended context window.

176847f

Don't ask me why it needs to be defined also with deepseek/ in order to apply.

chore: update deepseek-v3.1 benchmark results

5e64933

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: add DeepSeek V3.1 benchmark results to leaderboard #4464

docs: add DeepSeek V3.1 benchmark results to leaderboard #4464

Uh oh!

gcp commented Aug 19, 2025 •

edited

Loading

Uh oh!

gcp commented Aug 19, 2025

Uh oh!

xz-keg commented Aug 20, 2025

Uh oh!

gcp commented Aug 20, 2025

Uh oh!

alouisy commented Aug 21, 2025

Uh oh!

gcp commented Aug 21, 2025

Uh oh!

Kreijstal commented Oct 3, 2025

Uh oh!

gcp commented Oct 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

docs: add DeepSeek V3.1 benchmark results to leaderboard #4464

Are you sure you want to change the base?

docs: add DeepSeek V3.1 benchmark results to leaderboard #4464

Uh oh!

Conversation

gcp commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gcp commented Aug 19, 2025

Uh oh!

xz-keg commented Aug 20, 2025

Uh oh!

gcp commented Aug 20, 2025

Uh oh!

alouisy commented Aug 21, 2025

Uh oh!

gcp commented Aug 21, 2025

Uh oh!

Kreijstal commented Oct 3, 2025

Uh oh!

gcp commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

gcp commented Aug 19, 2025 •

edited

Loading

gcp commented Oct 3, 2025 •

edited

Loading