Skip to content

Conversation

@cjlovering
Copy link

@cjlovering cjlovering commented May 7, 2022

gsarti/flores_101 is a dataset with 102 languages used for language modeling.

  • Added the 102 language subsets
  • The prompt is empty -- just the sentence itself -- as this dataset will be used for language modeling (LM).
  • The metric is set to Other; downstream applications (eval harness) will select the LM metric.
  • It was necessary to add gsarti to the user list because the dataset is listed under that user in hugging face datasets.

@cjlovering cjlovering marked this pull request as ready for review May 7, 2022 18:09
@cjlovering
Copy link
Author

Given that this is a null prompt and we're currently using it as as a language modeling dataset, we can skip this for now. Using promptsource isn't necessary.

@stephenbach
Copy link
Member

Can this PR be closed if we're not prompting it?

@stephenbach stephenbach self-assigned this May 17, 2022
@rbawden
Copy link
Contributor

rbawden commented May 23, 2022

Hi there! For information, I have just prompted this dataset (automatically for MT). Linking the PR here: #779

@cjlovering cjlovering closed this May 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants