Skip to content

Latest commit

 

History

History

dataset

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

XPersona Dataset

XPersona dataset is an extension of the persona-chat dataset. Specifically, we extend the ConvAI2 to other six languages: Chinese, French, Indonesian, Italian, Korean, and Japanese. Since the test set of ConvAI2 is hidden, we split the original validation set into a new validation set and test sets.

Dataset Format

The data is a list of dialogues formated as following:

data = [dialogue1, dialogue2, dialogue3...]
dialogue = {"persona":[sentence1, sentence2...], "dialogue": [[user_utterence1, response1], [user_utterence2, response2]...]}