Xpersona/dataset at master · HLTCHKUST/Xpersona

History

README.md

XPersona Dataset

XPersona dataset is an extension of the persona-chat dataset. Specifically, we extend the ConvAI2 to other six languages: Chinese, French, Indonesian, Italian, Korean, and Japanese. Since the test set of ConvAI2 is hidden, we split the original validation set into a new validation set and test sets.

Dataset Format

The data is a list of dialogues formated as following:

data = [dialogue1, dialogue2, dialogue3...]

dialogue = {"persona":[sentence1, sentence2...], "dialogue": [[user_utterence1, response1], [user_utterence2, response2]...]}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataset

dataset

README.md

XPersona Dataset

Dataset Format

Name		Name	Last commit message	Last commit date
parent directory ..
En_persona_test.json		En_persona_test.json
En_persona_train.json		En_persona_train.json
En_persona_valid.json		En_persona_valid.json
Fr_persona_split_test_human_annotated.json		Fr_persona_split_test_human_annotated.json
Fr_persona_split_valid_human_annotated.json		Fr_persona_split_valid_human_annotated.json
Fr_persona_train_corrected.json		Fr_persona_train_corrected.json
Id_persona_split_test_human_annotated.json		Id_persona_split_test_human_annotated.json
Id_persona_split_valid_human_annotated.json		Id_persona_split_valid_human_annotated.json
Id_persona_train_corrected.json		Id_persona_train_corrected.json
It_persona_split_test_human_annotated.json		It_persona_split_test_human_annotated.json
It_persona_split_valid_human_annotated.json		It_persona_split_valid_human_annotated.json
It_persona_train_corrected.json		It_persona_train_corrected.json
Jp_persona_split_test_human_annotated.json		Jp_persona_split_test_human_annotated.json
Jp_persona_split_valid_human_annotated.json		Jp_persona_split_valid_human_annotated.json
Jp_persona_train_corrected.json		Jp_persona_train_corrected.json
Ko_persona_split_test_human_annotated.json		Ko_persona_split_test_human_annotated.json
Ko_persona_split_valid_human_annotated.json		Ko_persona_split_valid_human_annotated.json
Ko_persona_train_corrected.json		Ko_persona_train_corrected.json
README.md		README.md
Zh_persona_split_test_human_annotated.json		Zh_persona_split_test_human_annotated.json
Zh_persona_split_valid_human_annotated.json		Zh_persona_split_valid_human_annotated.json
Zh_persona_train_corrected.json		Zh_persona_train_corrected.json

Files

dataset

Directory actions

More options

Directory actions

More options

Latest commit

History

dataset

Folders and files

parent directory

README.md

XPersona Dataset

Dataset Format