XPersona dataset is an extension of the persona-chat dataset. Specifically, we extend the ConvAI2 to other six languages: Chinese, French, Indonesian, Italian, Korean, and Japanese. Since the test set of ConvAI2 is hidden, we split the original validation set into a new validation set and test sets.
The data is a list of dialogues formated as following:
data = [dialogue1, dialogue2, dialogue3...]
dialogue = {"persona":[sentence1, sentence2...], "dialogue": [[user_utterence1, response1], [user_utterence2, response2]...]}