Skip to content

bethgelab/lit-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lost in Translation (LiT) Benchmark

arXiv Website Hugging%20Face

This repository contains the Lost in Translation (LiT) Benchmark, introduced in the paper “Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss.”

Repository Layout

lit-benchmark/
├── data/            # Benchmark splits and convenience subset files
├── annotations/     # Raw category annotation views
├── website/         # GitHub Pages website source
└── src/             # Runtime, judge pipeline, and table builders

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

To rerun translation or judging, set:

export OPENROUTER_API_KEY=...

Running The Benchmark

Example multi-hop run:

python3 src/main.py \
  --model deepseek/deepseek-v3.2-exp \
  --dataset_name extended \
  --language_sequence japanese korean chinese russian \
  --judge_model x-ai/grok-4.1-fast \
  --exp_name seq

By default, generated traces are written to artifacts/traces/ and judge outputs are written to artifacts/scores/. Those runtime artifacts are not tracked in Git; use the Hugging Face dataset for the released trace and score files. The runtime supports these dataset names: extended, lit, robustness, abstracts, pragmatics, informal.

For batch runs:

Data

Primary benchmark files:

  • data/lit.jsonl
  • data/extended.jsonl
  • data/robustness.jsonl

Convenience subset files:

  • data/abstracts.jsonl
  • data/pragmatics.jsonl
  • data/informal.jsonl

Raw annotation views:

  • annotations/abstracts.jsonl
  • annotations/pragmatics.jsonl
  • annotations/robustness.jsonl

extended.jsonl is the full 260-example release:

  • 40 abstracts items
  • 120 pragmatics items
  • 40 informal items
  • 60 robustness items

lit.jsonl contains the 200 non-robustness evaluation examples: abstracts + pragmatics + informal. lit.jsonl and extended.jsonl intentionally keep only id, sentence, and group. Subset-specific metadata stays in the convenience files and raw annotation views:

  • data/abstracts.jsonl includes the abstracts category label
  • data/pragmatics.jsonl includes the pragmatics partition metadata
  • data/robustness.jsonl includes the robustness category metadata
  • annotations/*.jsonl preserve the released annotation views keyed by the same sample IDs

License

See LICENSE.

About

Codebase for the LiT Benchmark from "Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors