Skip to content

Conversation

gcp
Copy link
Contributor

@gcp gcp commented Aug 19, 2025

Actual cost was only 0.60$, i.e. I was only charged this much in my account.

@gcp
Copy link
Contributor Author

gcp commented Aug 19, 2025

Updated with extended context window, though given that there's still exactly one exhaustion the better score is likely just a bit more luck with sampling.

@xz-keg
Copy link

xz-keg commented Aug 20, 2025

It seems that deepseek v3.1 is a mixed thinking model and it is mainly a reasoner. Maybe you shall also test deepseek-reasoner.

@gcp
Copy link
Contributor Author

gcp commented Aug 20, 2025

Maybe you shall also test deepseek-reasoner.

I already ran this test but after 50 or so cases the score was also ~70%. It's not clear to me that API endpoint is updated.

@alouisy
Copy link

alouisy commented Aug 21, 2025

They just announced 10mins ago on X the new models in there API maybe I was to soon

@gcp
Copy link
Contributor Author

gcp commented Aug 21, 2025

...and they also announced a significant price hike for deepseek-chat, and the discontinuation of nighttime discount. That's quite sad.

@Kreijstal
Copy link

what are the results for 3.2 reasoner?

@gcp
Copy link
Contributor Author

gcp commented Oct 3, 2025

what are the results for 3.2 reasoner?

74.2% on the official API, and 70.2% for the chat model

See #4551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants