Skip to content

Add InfiniteBench: long-context evaluation beyond 100K tokens#3662

Open
siddhant-rajhans wants to merge 1 commit intoEleutherAI:mainfrom
siddhant-rajhans:add-infinitebench-tasks
Open

Add InfiniteBench: long-context evaluation beyond 100K tokens#3662
siddhant-rajhans wants to merge 1 commit intoEleutherAI:mainfrom
siddhant-rajhans:add-infinitebench-tasks

Conversation

@siddhant-rajhans
Copy link
Copy Markdown

Add 11 InfiniteBench tasks (math_calc excluded) covering retrieval, code, math, novel QA, and dialogue across English and Chinese.

Evaluation methods match the official implementation exactly:

  • First-int extraction for passkey/number_string
  • Word-level matching for kv_retrieval
  • Last-word int comparison for code_run
  • Last-letter extraction with answer-to-letter mapping for code_debug/longbook_choice
  • Token-level F1 for longbook_qa_en, character-level F1 for longbook_qa_chn
  • ROUGE-Lsum for longbook_sum_en
  • Substring matching for longdialogue_qa_en

Prompts match the official GPT-4 templates from the InfiniteBench repo.

Reference: https://arxiv.org/abs/2402.13718
Dataset: https://huggingface.co/datasets/xinrongzhang2022/InfiniteBench

Add 11 InfiniteBench tasks (math_calc excluded) covering retrieval,
code, math, novel QA, and dialogue across English and Chinese.

Evaluation methods match the official implementation exactly:
- First-int extraction for passkey/number_string
- Word-level matching for kv_retrieval
- Last-word int comparison for code_run
- Last-letter extraction with answer-to-letter mapping for code_debug/longbook_choice
- Token-level F1 for longbook_qa_en, character-level F1 for longbook_qa_chn
- ROUGE-Lsum for longbook_sum_en
- Substring matching for longdialogue_qa_en

Prompts match the official GPT-4 templates from the InfiniteBench repo.

Reference: https://arxiv.org/abs/2402.13718
Dataset: https://huggingface.co/datasets/xinrongzhang2022/InfiniteBench
@siddhant-rajhans siddhant-rajhans requested a review from 0xSMT as a code owner March 29, 2026 21:30
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 29, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants