Add InfiniteBench: long-context evaluation beyond 100K tokens by siddhant-rajhans · Pull Request #3662 · EleutherAI/lm-evaluation-harness

siddhant-rajhans · 2026-03-29T21:30:45Z

Add 11 InfiniteBench tasks (math_calc excluded) covering retrieval, code, math, novel QA, and dialogue across English and Chinese.

Evaluation methods match the official implementation exactly:

First-int extraction for passkey/number_string
Word-level matching for kv_retrieval
Last-word int comparison for code_run
Last-letter extraction with answer-to-letter mapping for code_debug/longbook_choice
Token-level F1 for longbook_qa_en, character-level F1 for longbook_qa_chn
ROUGE-Lsum for longbook_sum_en
Substring matching for longdialogue_qa_en

Prompts match the official GPT-4 templates from the InfiniteBench repo.

Reference: https://arxiv.org/abs/2402.13718
Dataset: https://huggingface.co/datasets/xinrongzhang2022/InfiniteBench

Add 11 InfiniteBench tasks (math_calc excluded) covering retrieval, code, math, novel QA, and dialogue across English and Chinese. Evaluation methods match the official implementation exactly: - First-int extraction for passkey/number_string - Word-level matching for kv_retrieval - Last-word int comparison for code_run - Last-letter extraction with answer-to-letter mapping for code_debug/longbook_choice - Token-level F1 for longbook_qa_en, character-level F1 for longbook_qa_chn - ROUGE-Lsum for longbook_sum_en - Substring matching for longdialogue_qa_en Prompts match the official GPT-4 templates from the InfiniteBench repo. Reference: https://arxiv.org/abs/2402.13718 Dataset: https://huggingface.co/datasets/xinrongzhang2022/InfiniteBench

CLAassistant · 2026-03-29T21:30:52Z

All committers have signed the CLA.

siddhant-rajhans requested a review from 0xSMT as a code owner March 29, 2026 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add InfiniteBench: long-context evaluation beyond 100K tokens#3662

Add InfiniteBench: long-context evaluation beyond 100K tokens#3662
siddhant-rajhans wants to merge 1 commit intoEleutherAI:mainfrom
siddhant-rajhans:add-infinitebench-tasks

siddhant-rajhans commented Mar 29, 2026

Uh oh!

CLAassistant commented Mar 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

siddhant-rajhans commented Mar 29, 2026

Uh oh!

CLAassistant commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Mar 29, 2026 •

edited

Loading