Skip to content

Conversation

jon-tow
Copy link
Collaborator

@jon-tow jon-tow commented Jun 1, 2022

  • Adds support for multilingual ROUGE scoring by providing language-specific tokenization via nltk.

  • Adds a code_to_pycountry_lang utility that maps ISO codes to pycountry.db.Language objects for robust language name parsing.

  • Removes rougeLsum in the default rouge_types arg as sentences are not separated by newlines which breaks the rouge_scorer assumption.

TODO

  • Add sentence-level tokenization (possibly use nltk.sent_tokenize?). As mentioned above, rouge-score==0.0.4 (the latest package release) expects sentences be split by newlines to compute the rougeLsum score. The latest version on their master branch contains automatic sentence splitting support. Unfortunately, this repo is not pip installable because there exists a module at the project root level named tokenize.py that overrides a module of the same name in pip's setuptools dependency, breaking the installation.

  • Find a clean abstraction for tagging non-English PromptSourceTasks with their language. This tag could then be used to construct the multilingual NltkWordTokenizer that gets passed into rouge and other metrics that may need multilingual support in the future. Possibly use promptsource's language tagging: Language tags promptsource#771

@Muennighoff
Copy link

Can we still use the current ROUGE score in LMEVAL for non-space languages?
It seems to me like PaLM used it https://arxiv.org/pdf/2204.02311.pdf for many other languages than English

Also related: ROUGE-scores are 0-1 & BLEU 0-100 in LMEVAL right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants