DocPTBench is a benchmark designed specifically for real-world photographed documents, targeting both document parsing and document translation in challenging, realistic environments.
Unlike previous benchmarks built on clean-born digital documents, DocPTBench exposes models to:
- perspective distortion
- lighting variations / shadows
- motion blur
- physical folds & wrinkles
- noise and camera artifacts
This benchmark enables rigorous evaluation of both Document Parsing models and Multimodal LLMs (MLLMs) under practical conditions.
(a): the results of MLLMs on English (En)-started parsing (P) and translation (T) tasks; (b): the counterpart on Chinese (Zh)-started tasks; (c): the results from document parsing expert models. Ori- refers to the original digital-born document and Photographed-is its photographed version. Text- indicates that only the textual content of the document image is used as the source-language input. Alower Edit distance indicates higher parsing quality, and a higher BLEU score reflects better translation fidelity.
- 📉 MLLMs an average parsing drops by 18% on photographed docs
- 📉 Expert models drop 25%
- 📉 Translation BLEU drops by 12%
- 🔧 Unwarping helps, but does not fully restore original quality
- 💡 CoT prompting greatly reduces instruction-following failures
Including both simulated and real-camera captures.
En ↔ Zh / De / Fr / Ru and Zh ↔ En / De / Fr / Ru, all human-verified.
Digital-Born (Original) → Photographed → Unwarping
Supports both:
- Parsing-only models
- Unified end-to-end MLLMs
| Type | Model | Scene | OverallEdit↓ | TextEdit↓ | FormulaEdit↓ | TableTEDS↑ | TableEdit↓ | Reading OrderEdit↓ | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| En | Zh | En | Zh | En | Zh | En | Zh | En | Zh | En | Zh | |||
| Expert Models | ||||||||||||||
| HunyuanOCR | Original | 9.6 | 10.9 | 3.1 | 6.6 | 26.4 | 28.8 | 94.0 | 95.9 | 4.7 | 3.3 | 4.3 | 4.9 | |
| Photographed | 22.4↓12.8 | 30.4↓19.5 | 17.0↓13.9 | 31.6↓25.0 | 34.3↓7.9 | 44.3↓15.5 | 74.0↓20.0 | 75.2↓20.7 | 22.9↓18.2 | 20.6↓17.3 | 15.4↓11.1 | 25.3↓20.4 | ||
| Unwarping | 15.8↑6.6 | 22.8↑7.6 | 6.6↑10.4 | 19.3↑12.3 | 31.4↑2.9 | 44.0↑0.3 | 80.0↑6.1 | 84.9↑9.7 | 17.0↑5.9 | 12.5↑8.1 | 8.1↑7.3 | 15.4↑9.9 | ||
| PaddleOCR-VL | Original | 10.5 | 12.6 | 4.1 | 6.2 | 24.1 | 31.6 | 88.0 | 92.1 | 9.3 | 6.2 | 4.5 | 6.3 | |
| Photographed | 37.5↓27.0 | 39.6↓27.0 | 29.4↓25.3 | 37.7↓31.5 | 46.5↓22.4 | 52.6↓21.0 | 54.2↓33.8 | 65.3↓26.8 | 44.4↓35.1 | 31.4↓25.2 | 28.8↓24.3 | 37.9↓31.6 | ||
| Unwarping | 15.7↑21.8 | 22.0↑17.6 | 9.4↑20.0 | 17.6↑20.1 | 30.8↑15.7 | 41.5↑11.1 | 82.9↑28.7 | 83.2↑17.9 | 13.9↑30.5 | 13.5↑17.9 | 8.7↑20.1 | 15.4↑22.5 | ||
| MinerU2.5 | Original | 11.1 | 17.4 | 5.0 | 7.4 | 25.8 | 47.3 | 88.3 | 89.2 | 8.9 | 8.3 | 4.5 | 6.8 | |
| Photographed | 37.3↓26.2 | 47.4↓30.0 | 37.0↓32.0 | 53.6↓46.2 | 44.3↓18.5 | 62.0↓14.7 | 54.9↓33.4 | 59.8↓29.4 | 38.9↓30.0 | 33.5↓25.2 | 29.0↓24.5 | 40.3↓33.5 | ||
| Unwarping | 17.3↑20.0 | 25.2↑22.2 | 13.1↑23.9 | 19.1↑34.5 | 31.9↑12.4 | 52.2↑9.8 | 79.2↑24.3 | 81.1↑21.3 | 15.7↑23.2 | 14.6↑18.9 | 8.3↑20.7 | 15.0↑25.3 | ||
| dots.ocr | Original | 12.5 | 16.0 | 3.2 | 6.6 | 32.9 | 41.6 | 88.6 | 89.0 | 9.9 | 9.2 | 4.0 | 6.7 | |
| Photographed | 33.7↓21.2 | 37.3↓21.3 | 29.8↓26.6 | 35.8↓29.2 | 39.2↓6.3 | 54.4↓12.8 | 63.7↓24.9 | 67.6↓21.4 | 33.0↓23.1 | 27.1↓17.9 | 32.8↓28.8 | 31.8↓25.1 | ||
| Unwarping | 16.3↑17.4 | 24.1↑13.2 | 8.3↑21.5 | 20.9↑14.9 | 32.2↑7.0 | 42.0↑12.4 | 80.2↑16.5 | 82.3↑14.7 | 16.9↑16.1 | 14.6↑12.5 | 7.9↑24.9 | 18.9↑12.9 | ||
| MonkeyOCR | Original | 14.6 | 22.1 | 6.8 | 11.8 | 27.2 | 45.2 | 81.3 | 85.5 | 14.9 | 13.4 | 9.3 | 17.9 | |
| Photographed | 46.4↓31.8 | 52.8↓30.7 | 34.5↓27.7 | 43.9↓32.1 | 48.7↓21.5 | 61.6↓16.4 | 33.1↓48.2 | 37.4↓48.1 | 64.5↓49.6 | 61.5↓48.1 | 37.9↓28.6 | 44.1↓26.2 | ||
| Unwarping | 18.8↑27.6 | 31.9↑20.9 | 12.5↑22.0 | 23.6↑20.3 | 32.1↑16.6 | 55.8↑5.8 | 77.2↑44.1 | 77.1↑39.7 | 17.2↑47.3 | 19.5↑42.0 | 13.5↑24.4 | 28.7↑15.4 | ||
| Dolphin | Original | 20.5 | 31.3 | 9.2 | 20.4 | 44.7 | 60.6 | 76.1 | 66.9 | 19.3 | 28.2 | 8.8 | 11.6 | |
| Photographed | 57.5↓37.0 | 71.5↓40.2 | 54.9↓45.7 | 71.5↓51.1 | 65.6↓20.9 | 82.8↓22.2 | 33.0↓43.1 | 19.3↓47.6 | 67.9↓48.6 | 73.9↓45.7 | 46.2↓37.4 | 57.7↓46.1 | ||
| Unwarping | 27.3↑30.2 | 45.5↑26.0 | 17.9↑37.0 | 36.9↑34.6 | 48.3↑17.3 | 75.1↑7.7 | 63.8↑30.8 | 48.6↑29.3 | 29.2↑38.7 | 42.5↑31.4 | 13.9↑32.3 | 27.3↑30.4 | ||
| olmOCR | Original | 32.6 | 46.9 | 9.7 | 29.3 | 45.5 | 65.5 | 68.1 | 61.3 | 60.8 | 65.2 | 14.5 | 27.7 | |
| Photographed | 39.1↓6.5 | 46.1↑0.8 | 19.3↓9.6 | 27.2↑2.1 | 50.7↓5.2 | 66.9↓1.4 | 56.5↓11.6 | 56.9↓4.4 | 65.6↓4.8 | 66.0↓0.8 | 20.7↓6.2 | 24.4↑3.3 | ||
| Unwarping | 31.4↑7.7 | 43.1↑3.0 | 9.6↑9.7 | 23.7↑3.5 | 40.0↑10.7 | 61.3↑5.6 | 65.8↑9.3 | 63.7↑6.8 | 62.7↑2.9 | 63.3↑2.7 | 13.4↑7.3 | 23.9↑0.5 | ||
| OCRFlux | Original | 23.8 | 34.9 | 11.2 | 25.6 | 44.7 | 71.6 | 69.0 | 80.0 | 26.9 | 16.2 | 12.6 | 26.3 | |
| Photographed | 36.2↓12.4 | 45.8↓10.9 | 30.4↓19.2 | 40.4↓14.8 | 48.4↓3.7 | 81.1↓9.5 | 49.5↓19.5 | 54.3↓25.7 | 29.7↓2.8 | 29.7↓13.5 | 22.5↓9.9 | 32.1↓5.8 | ||
| Unwarping | 23.6↑12.6 | 37.9↑7.9 | 11.8↑18.6 | 29.7↑10.7 | 42.5↑5.9 | 73.7↑7.4 | 68.1↑18.6 | 72.7↑18.4 | 27.6↑2.1 | 20.8↑8.9 | 12.7↑9.8 | 27.3↑4.8 | ||
| SmolDocling | Original | 49.3 | 81.6 | 26.2 | 82.8 | 75.3 | 99.7 | 16.5 | 7.3 | 90.8 | 92.7 | 22.7 | 52.2 | |
| Photographed | 90.1↓40.8 | 93.7↓12.1 | 89.8↓63.6 | 99.2↓16.4 | 99.6↓24.3 | 99.9↓0.2 | 4.4↓12.1 | 2.4↓4.9 | 98.4↓7.6 | 98.8↓6.1 | 72.7↓50.0 | 75.9↓23.7 | ||
| Unwarping | 65.2↑24.9 | 92.8↑0.9 | 45.6↑44.2 | 97.9↑1.3 | 92.8↑6.8 | 99.7↑0.2 | 25.9↑21.5 | 1.7↓0.7 | 90.0↑8.4 | 100.0↓1.2 | 38.6↑34.1 | 74.6↑1.3 | ||
| Nanonets-OCR | Original | 28.3 | 29.5 | 13.4 | 23.1 | 51.8 | 54.6 | 76.8 | 79.4 | 34.3 | 20.1 | 13.5 | 20.0 | |
| Photographed | 38.6↓10.3 | 52.1↓22.6 | 21.0↓7.6 | 42.0↓18.9 | 48.1↑3.7 | 67.0↓12.4 | 58.5↓18.3 | 50.6↓28.8 | 64.1↓29.8 | 66.7↓46.6 | 21.4↓7.9 | 32.7↓12.7 | ||
| Unwarping | 32.0↑6.6 | 44.4↑7.7 | 13.2↑7.8 | 30.2↑11.8 | 42.6↑5.5 | 65.6↑1.4 | 59.9↑1.4 | 59.8↑9.2 | 56.1↑8.0 | 56.1↑10.6 | 14.4↑7.0 | 25.6↑7.1 | ||
| DeepSeek-OCR | Original | 13.4 | 18.1 | 4.6 | 9.7 | 28.5 | 43.3 | 82.6 | 89.0 | 13.8 | 8.8 | 6.7 | 10.5 | |
| Photographed | 54.4↓41.0 | 57.8↓39.7 | 56.7↓52.1 | 57.6↓47.9 | 54.4↓25.9 | 74.1↓30.8 | 28.0↓54.6 | 35.4↓53.6 | 64.7↓50.9 | 59.2↓50.4 | 41.7↓35.0 | 40.4↓29.9 | ||
| Unwarping | 22.1↑32.3 | 33.5↑24.3 | 14.9↑41.8 | 29.4↑28.2 | 32.1↑22.3 | 58.8↑15.3 | 67.0↑39.0 | 75.8↑40.4 | 26.7↑38.0 | 20.9↑38.3 | 14.8↑26.9 | 24.9↑15.5 | ||
| olmOCR2 | Original | 16.1 | 26.7 | 4.8 | 18.5 | 39.2 | 54.3 | 83.7 | 78.5 | 12.3 | 16.5 | 8.1 | 17.4 | |
| Photographed | 27.8↓11.7 | 44.6↓17.9 | 22.0↓17.2 | 39.9↓21.4 | 44.6↓5.4 | 74.1↓19.8 | 67.6↓16.1 | 65.4↓13.1 | 24.6↓12.3 | 28.5↓12.0 | 19.9↓11.8 | 36.0↓18.6 | ||
| Unwarping | 17.5↑10.3 | 37.2↑7.4 | 7.3↑14.7 | 32.9↑7.0 | 37.5↑7.1 | 66.7↑7.4 | 81.9↑14.3 | 77.2↑11.8 | 14.3↑10.3 | 19.1↑9.4 | 11.0↑8.9 | 30.2↑5.8 | ||
| Nanonets-OCR2 | Original | 26.6 | 34.9 | 19.4 | 34.3 | 60.0 | 68.0 | 81.5 | 82.5 | 15.5 | 17.9 | 11.6 | 19.4 | |
| Photographed | 34.2↓7.6 | 46.1↓11.2 | 25.5↓6.1 | 44.6↓10.3 | 69.0↓9.0 | 76.4↓8.4 | 70.7↓10.8 | 66.0↓16.5 | 22.8↓7.3 | 31.9↓14.0 | 19.5↓7.9 | 31.4↓12.0 | ||
| Unwarping | 30.6↑3.6 | 40.0↑6.1 | 21.1↑4.4 | 32.6↑12.0 | 65.3↑3.7 | 77.3↓0.9 | 71.9↑1.2 | 73.1↑7.1 | 24.8↓2.0 | 18.5↑13.4 | 17.5↑2.0 | 25.2↑6.2 | ||
| General MLLMs | Qwen2.5-VL-72B | Original | 21.4 | 26.1 | 9.2 | 18.0 | 31.5 | 43.4 | 82.9 | 83.9 | 34.1 | 26.2 | 10.6 | 16.8 |
| Photographed | 41.5↓20.1 | 57.0↓30.9 | 36.2↓27.0 | 56.6↓38.6 | 42.2↓10.7 | 61.8↓18.4 | 57.0↓25.9 | 55.5↓28.4 | 59.6↓25.5 | 58.2↓32.0 | 28.1↓17.5 | 51.3↓34.5 | ||
| Unwarping | 24.0↑17.5 | 41.4↑15.6 | 11.1↑25.1 | 42.7↑13.9 | 29.9↑12.3 | 48.4↑13.4 | 77.4↑20.4 | 76.1↑20.6 | 42.7↑16.9 | 34.9↑23.3 | 12.3↑15.8 | 39.7↑11.6 | ||
| Gemini2.5-Pro | Original | 14.8 | 21.2 | 5.5 | 16.8 | 35.6 | 43.9 | 85.8 | 86.4 | 13.0 | 11.9 | 4.9 | 12.1 | |
| Photographed | 18.2↓3.4 | 30.4↓9.2 | 9.8↓4.3 | 27.7↓10.9 | 37.1↓1.5 | 56.8↓12.9 | 81.3↓4.5 | 82.9↓3.5 | 14.6↓1.6 | 13.7↓1.8 | 11.2↓6.3 | 23.6↓11.5 | ||
| Unwarping | 16.9↑1.3 | 27.3↑3.1 | 9.2↑0.6 | 20.8↑6.9 | 35.3↑1.8 | 57.0↓0.2 | 83.4↑2.1 | 85.9↑3.0 | 13.1↑1.5 | 11.8↑1.9 | 10.0↑1.2 | 19.8↑3.8 | ||
| Doubao-1.6-v | Original | 22.5 | 29.3 | 16.2 | 27.6 | 31.2 | 47.2 | 66.6 | 76.3 | 31.9 | 24.5 | 10.8 | 17.9 | |
| Photographed | 54.7↓32.2 | 55.4↓26.1 | 60.6↓44.4 | 58.2↓30.6 | 51.5↓20.3 | 61.1↓13.9 | 27.6↓39.0 | 37.9↓38.4 | 67.0↓35.1 | 61.9↓37.4 | 39.7↓28.9 | 40.2↓22.3 | ||
| Unwarping | 30.0↑24.7 | 42.5↑12.9 | 23.8↑36.8 | 41.8↑16.4 | 34.5↑17.0 | 56.4↑4.7 | 55.7↑28.1 | 60.8↑22.9 | 44.9↑22.1 | 42.4↑19.5 | 16.7↑23.0 | 29.5↑10.7 | ||
| Qwen-VL-Max | Original | 16.6 | 26.5 | 5.2 | 20.5 | 32.9 | 44.0 | 84.2 | 86.7 | 22.0 | 23.7 | 6.5 | 17.7 | |
| Photographed | 27.7↓11.1 | 42.7↓16.2 | 15.9↓10.7 | 41.5↓21.0 | 41.8↓8.9 | 57.2↓13.2 | 71.1↓13.1 | 71.6↓15.1 | 36.3↓14.3 | 38.0↓14.3 | 16.8↓10.3 | 34.4↓16.7 | ||
| Unwarping | 19.0↑8.7 | 32.6↑10.1 | 6.8↑9.1 | 32.1↑9.4 | 33.8↑8.0 | 48.5↑8.7 | 81.3↑10.2 | 83.3↑11.7 | 26.5↑9.8 | 22.0↑16.0 | 9.0↑7.8 | 27.8↑6.6 | ||
| GLM-4.5v | Original | 25.5 | 32.0 | 16.1 | 27.7 | 43.8 | 51.8 | 74.0 | 77.4 | 26.9 | 30.5 | 15.4 | 17.9 | |
| Photographed | 36.7↓11.2 | 49.6↓17.6 | 26.2↓10.1 | 47.7↓20.0 | 49.9↓6.1 | 66.2↓14.4 | 58.9↓15.1 | 54.0↓23.4 | 43.5↓16.6 | 49.0↓18.5 | 27.3↓11.9 | 35.7↓17.8 | ||
| Unwarping | 23.9↑12.8 | 36.9↑12.7 | 13.1↑13.1 | 37.7↑10.0 | 39.0↑10.9 | 53.5↑12.7 | 73.8↑14.9 | 75.6↑21.6 | 26.9↑16.6 | 28.7↑20.3 | 16.5↑10.8 | 27.7↑8.0 | ||
| Kimi-VL | Original | 36.5 | 38.7 | 17.2 | 22.0 | 48.6 | 52.2 | 57.1 | 67.8 | 65.9 | 62.5 | 14.3 | 18.1 | |
| Photographed | 69.6↓33.1 | 68.7↓30.0 | 66.0↓48.8 | 63.5↓41.5 | 75.5↓26.9 | 82.6↓30.4 | 16.4↓40.7 | 22.9↓44.9 | 85.4↓19.5 | 82.2↓19.7 | 51.6↓37.3 | 46.7↓28.6 | ||
| Unwarping | 41.1↑28.5 | 50.7↑18.0 | 26.3↑39.7 | 38.5↑25.0 | 50.4↑25.1 | 68.8↑13.8 | 55.4↑39.0 | 62.3↑39.4 | 65.4↑20.0 | 65.0↑17.2 | 22.1↑29.5 | 30.7↑16.0 | ||
| Type | Model | Input | En-Zh | Zh-En | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| BLEU | chrF | METEOR | STEDS | BLEU | chrF | METEOR | STEDS | |||
| Open Source | Qwen3-VL-4B | Text | 49.61 | 56.87 | 66.74 | 94.35 | 50.20 | 72.82 | 64.91 | 94.24 |
| Original-Simple | 32.11↓17.50 | 40.22↓16.65 | 47.49↓19.25 | 64.55↓29.80 | 28.31↓21.89 | 48.72↓24.10 | 40.44↓24.47 | 68.41↓25.83 | ||
| Original-CoT | 36.86↑4.75 | 45.17↑4.95 | 53.97↑6.48 | 68.83↑4.28 | 34.84↑6.53 | 57.29↑8.57 | 48.75↑8.31 | 66.14↓2.27 | ||
| Qwen2.5-VL-3B | Text | 48.60 | 55.39 | 63.91 | 81.59 | 45.29 | 66.13 | 57.55 | 87.35 | |
| Original-Simple | 18.18↓30.42 | 25.65↓29.74 | 27.42↓36.49 | 59.02↓22.57 | 15.20↓30.09 | 23.73↓42.40 | 20.78↓36.77 | 60.87↓26.48 | ||
| Original-CoT | 19.37↑1.19 | 28.85↑3.20 | 32.09↑4.67 | 49.57↓9.45 | 18.50↑3.30 | 35.56↑11.83 | 28.98↑8.20 | 48.24↓12.63 | ||
| InternVL3-2B | Text | 48.25 | 54.29 | 62.48 | 89.42 | 33.54 | 50.01 | 43.78 | 84.94 | |
| Original-Simple | 10.87↓37.38 | 17.33↓36.96 | 18.91↓43.57 | 55.90↓33.52 | 7.27↓26.27 | 11.63↓38.38 | 10.38↓33.40 | 57.83↓27.11 | ||
| Original-CoT | 19.21↑8.34 | 28.07↑10.74 | 32.91↑14.00 | 55.16↓0.74 | 22.07↑14.80 | 46.01↑34.38 | 36.06↑25.68 | 59.16↑1.33 | ||
| InternVL3.5-2B | Text | 57.49 | 63.14 | 72.23 | 94.29 | 48.46 | 69.48 | 61.02 | 92.18 | |
| Original-Simple | 25.43↓32.06 | 34.62↓28.52 | 40.15↓32.08 | 64.44↓29.85 | 8.42↓40.04 | 11.04↓58.44 | 10.52↓50.50 | 65.03↓27.15 | ||
| Original-CoT | 31.42↑5.99 | 41.25↑6.63 | 48.69↑8.54 | 65.14↑0.70 | 28.28↑19.86 | 50.16↑39.12 | 41.75↑31.23 | 61.86↓3.17 | ||
| HunyuanOCR | Original-self | 16.77 | 24.91 | 29.17 | 58.88 | 12.49 | 26.25 | 19.04 | 56.09 | |
| Photographed-self | 13.49↓3.28 | 21.86↓3.05 | 25.68↓3.49 | 53.22↓5.66 | 10.50↓1.99 | 25.02↓1.23 | 18.63↓0.41 | 50.72↓5.27 | ||
| Closed Source | Gemini2.5-Pro | Text | 60.07 | 66.54 | 76.39 | 92.90 | 53.62 | 76.01 | 70.06 | 91.23 |
| Original-Simple | 44.34↓15.73 | 53.83↓12.71 | 64.97↓11.42 | 71.77↓21.13 | 37.96↓15.66 | 67.45↓8.56 | 58.04↓12.02 | 65.75↓25.48 | ||
| Original-CoT | 44.41↑0.07 | 53.94↑0.11 | 65.68↑0.71 | 75.05↑3.28 | 42.81↑4.85 | 69.62↑2.17 | 61.67↑3.63 | 75.37↑9.62 | ||
| Photographed-Simple | 43.72↓0.62 | 53.77↓0.06 | 63.68↓1.29 | 71.82↑0.05 | 32.88↓5.08 | 62.95↓4.50 | 52.24↓5.80 | 63.42↓2.33 | ||
| Photographed-CoT | 43.88↓0.53 | 53.88↓0.06 | 64.06↓1.62 | 75.18↑0.13 | 34.89↓7.92 | 61.59↓8.03 | 51.88↓9.79 | 70.26↓5.11 | ||
| Qwen-VL-Max | Text | 69.41 | 74.05 | 82.81 | 96.91 | 54.33 | 75.19 | 67.35 | 92.19 | |
| Original-Simple | 41.04↓28.37 | 50.81↓23.24 | 59.77↓23.04 | 72.76↓24.15 | 36.29↓18.04 | 61.03↓14.16 | 50.40↓16.95 | 71.68↓20.51 | ||
| Original-CoT | 47.60↑6.56 | 55.70↑4.89 | 64.10↑4.33 | 72.67↓0.09 | 42.28↑5.99 | 66.05↑5.02 | 56.44↑6.04 | 69.68↓2.00 | ||
| Photographed-Simple | 27.53↓13.51 | 37.25↓13.56 | 43.81↓15.96 | 69.02↓3.74 | 21.81↓14.48 | 45.93↓15.10 | 34.44↓15.96 | 64.96↓6.72 | ||
| Photographed-CoT | 37.44↓10.16 | 46.76↓8.94 | 54.99↓9.11 | 68.24↓4.43 | 30.64↓11.64 | 54.88↓11.17 | 44.43↓12.01 | 64.16↓5.52 | ||
| GLM-4.5v | Text | 62.53 | 68.38 | 77.84 | 95.57 | 55.51 | 75.62 | 68.56 | 92.84 | |
| Original-Simple | 42.14↓20.39 | 51.20↓17.18 | 60.82↓17.02 | 73.72↓21.85 | 39.02↓16.49 | 62.67↓12.95 | 53.10↓15.46 | 74.34↓18.50 | ||
| Original-CoT | 45.90↑3.76 | 55.09↑3.89 | 64.91↑4.09 | 73.14↓0.58 | 42.34↑3.32 | 66.92↑4.25 | 57.48↑4.38 | 72.43↓1.91 | ||
| Photographed-Simple | 31.03↓11.11 | 41.02↓10.18 | 47.41↓13.41 | 71.21↓2.51 | 24.82↓14.20 | 46.42↓16.25 | 37.45↓15.65 | 60.44↓13.90 | ||
| Photographed-CoT | 37.48↓8.42 | 46.72↓8.37 | 54.39↓10.52 | 70.94↓2.20 | 29.88↓12.46 | 53.71↓13.21 | 44.15↓13.33 | 62.60↓9.83 | ||
| Kimi-VL | Text | 67.95 | 72.45 | 81.78 | 97.34 | 60.76 | 78.64 | 73.47 | 95.61 | |
| Original-Simple | 38.20↓29.75 | 47.17↓25.28 | 55.14↓26.64 | 70.38↓26.96 | 32.07↓28.69 | 54.72↓23.92 | 44.93↓28.54 | 69.85↓25.76 | ||
| Original-CoT | 42.36↑4.16 | 50.94↑3.77 | 58.68↑3.54 | 68.66↓1.72 | 42.63↑10.56 | 64.24↑9.52 | 55.75↑10.82 | 69.03↓0.82 | ||
| Photographed-Simple | 9.16↓29.04 | 15.97↓31.20 | 20.51↓34.63 | 49.05↓21.33 | 9.15↓22.92 | 27.77↓26.95 | 18.52↓26.41 | 50.99↓18.86 | ||
| Photographed-CoT | 12.07↓30.29 | 19.17↓31.77 | 23.46↓35.22 | 52.42↓16.24 | 15.78↓26.85 | 34.88↓29.36 | 26.49↓29.26 | 49.07↓19.96 | ||
| Doubao-1.6-v | Text | 54.92 | 62.59 | 72.26 | 87.26 | 46.15 | 71.22 | 62.51 | 83.70 | |
| Original-Simple | 39.29↓15.63 | 49.73↓12.86 | 59.29↓12.97 | 69.80↓17.46 | 34.31↓11.84 | 61.94↓9.28 | 51.50↓11.01 | 70.99↓12.71 | ||
| Original-CoT | 41.61↑2.32 | 51.09↑1.36 | 61.32↑2.03 | 71.52↑1.72 | 36.98↑2.67 | 64.47↑2.53 | 54.26↑2.76 | 71.98↑0.99 | ||
| Photographed-Simple | 35.36↓3.93 | 46.47↓3.26 | 53.60↓5.69 | 66.46↓3.34 | 26.88↓7.43 | 53.62↓8.32 | 42.58↓8.92 | 63.27↓7.72 | ||
| Photographed-CoT | 39.61↓2.00 | 49.61↓1.48 | 57.88↓3.44 | 66.70↓4.82 | 29.91↓7.07 | 56.52↓7.95 | 45.97↓8.29 | 63.53↓8.45 | ||
Refer to the appendix of the paper.
Refer to the parsing.md for evaluation details.
Refer to the translation.md for evaluation details.
- PaddleOCR-VL
- MinerU2.5
- dots.ocr
- MonkeyOCR
- DeepSeek-OCR
- olmOCR and olmOCR2
- Dolphin
- OCRFlux
- SmolDocling
- Nanonets-OCR and Nanonets-OCR2
- HunyuanOCR
- Gemini2.5 Pro
- Qwen-VL-Max
- Kimi-VL
- GLM-4.5v
- Doubao 1.6-v
- Gemini3 Pro
- Qwen3-VL-4B
- Qwen2.5-VL-3B
- InternVL3-2B
- InternVL3.5-2B
- Qwen3-VL-235B
If you use DocPTBench, please cite:
@misc{docptbench2025,
title={DocPTBench: Benchmarking End-to-End Photographed Document Parsing and Translation},
author={Yongkun Du and Pinxuan Chen and Xuye Ying and Zhineng Chen},
year={2025},
eprint={2511.18434},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.18434}
}Additionally, we encourage you to cite the following papers:
@misc{ouyang2024omnidocbenchbenchmarkingdiversepdf,
title={OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations},
author={Linke Ouyang and Yuan Qu and Hongbin Zhou and Jiawei Zhu and Rui Zhang and Qunshu Lin and Bin Wang and Zhiyuan Zhao and Man Jiang and Xiaomeng Zhao and Jin Shi and Fan Wu and Pei Chu and Minghao Liu and Zhenxiang Li and Chao Xu and Bo Zhang and Botian Shi and Zhongying Tu and Conghui He},
year={2024},
eprint={2412.07626},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2412.07626},
}DocPTBench is developed based on OmniDocBench. Thanks for their awesome work!

