Pinned Loading
Repositories
Showing 10 of 28 repositories
- u-math Public
Official evaluation code for the U-MATH and μ-MATH benchmarks. These datasets are designed to test the mathematical reasoning and meta-evaluation capabilities of LLMs on university-level problems.
Toloka/u-math’s past year of commit activity - beemo Public
Benchmark for fine-grained machine-generated text detection. 6.5k texts written by humans, generated by ten open-source instruction-finetuned LLMs and edited by expert annotators.
Toloka/beemo’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…