Open
Description
Issue encountered
While working on several Japanese benchmark tasks, I observed that standard BLEU, CHRF, and TER metrics are suboptimal for Asian languages.
To address this, I propose adding a parameter to CorpusLevelTranslationMetric that allows integration with tokenizers tailored for Asian languages.
Solution/Feature
SacreBLEU
already includes tokenizers designed for Asian languages, which lack space-separated words. By modifying the implementation slightly, we can extend CorpusLevelTranslationMetric to better handle these languages.
Possible alternatives
A clear and concise description of any alternative solutions or features you've considered.