Lost in Translation (LiT) Benchmark

This repository contains the Lost in Translation (LiT) Benchmark, introduced in the paper “Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss.”

Repository Layout

lit-benchmark/
├── data/            # Benchmark splits and convenience subset files
├── annotations/     # Raw category annotation views
├── website/         # GitHub Pages website source
└── src/             # Runtime, judge pipeline, and table builders

Installation

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

To rerun translation or judging, set:

export OPENROUTER_API_KEY=...

Running The Benchmark

Example multi-hop run:

python3 src/main.py \
  --model deepseek/deepseek-v3.2-exp \
  --dataset_name extended \
  --language_sequence japanese korean chinese russian \
  --judge_model x-ai/grok-4.1-fast \
  --exp_name seq

By default, generated traces are written to artifacts/traces/ and judge outputs are written to artifacts/scores/. Those runtime artifacts are not tracked in Git; use the Hugging Face dataset for the released trace and score files. The runtime supports these dataset names: extended, lit, robustness, abstracts, pragmatics, informal.

For batch runs:

Data

Primary benchmark files:

data/lit.jsonl
data/extended.jsonl
data/robustness.jsonl

Convenience subset files:

data/abstracts.jsonl
data/pragmatics.jsonl
data/informal.jsonl

Raw annotation views:

annotations/abstracts.jsonl
annotations/pragmatics.jsonl
annotations/robustness.jsonl

extended.jsonl is the full 260-example release:

40 abstracts items
120 pragmatics items
40 informal items
60 robustness items

lit.jsonl contains the 200 non-robustness evaluation examples: abstracts + pragmatics + informal. lit.jsonl and extended.jsonl intentionally keep only id, sentence, and group. Subset-specific metadata stays in the convenience files and raw annotation views:

data/abstracts.jsonl includes the abstracts category label
data/pragmatics.jsonl includes the pragmatics partition metadata
data/robustness.jsonl includes the robustness category metadata
annotations/*.jsonl preserve the released annotation views keyed by the same sample IDs

License

See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
annotations		annotations
data		data
src		src
website		website
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lost in Translation (LiT) Benchmark

Repository Layout

Installation

Running The Benchmark

Data

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lost in Translation (LiT) Benchmark

Repository Layout

Installation

Running The Benchmark

Data

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages