This repository is the official site of MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
A Dialogue Case of MMDialog:
Statistics:
Dataset Folder Format:
File: conversations.json
Note:
- train set do not contains "negative_candidate_media_keys" and "negative_candidate_texts", which only exists in test and valid set. Each "negative_candidate_xxx" contains 999 negative candidates for retrieval task.
- Words like :smiling_face_with_smiling_eyes: and :raising_hands: are emotion tokens, we will share you the mapping between these tokens and real emotions.
- Who it's for: You are either a master’s student, doctoral candidate, post-doc, faculty, or research-focused employee at an academic institution or university.
- Non-commercial use: You should only use this access for non-commercial purposes.
- Clearly Plan: You have a clearly defined research objective, and you have specific plans for how you intend to use and analyze this data from your research.
- Promise your behavior: You should promise you would not share this dataset without our qualification review and permission.
If you don't meet all of the requirements above, we would not share you the dataset.
Item | Description |
---|---|
Your Role | [master’s student / doctoral candidate / post-doc / faculty / research-focused employee / others] |
Your study or work organization | e.g. Microsoft Research, DeepMind, Cornell University, ... |
Your Academic Homepage | Your [Google Scholar] or [Homepage_URL running on your organization website (e.g. yourname.people.xxx.edu / yourname.xxx.people.msr.microsoft)] |
Non-commercial Use | You [promise / cannot promise] that you will not apply data to commercial scenarios or products. |
Sharing Limitation | You [promise / cannot promise] you would not share this dataset without our qualification review and permission. |
Your Plan | (Describe your research plan and how you intend to use and analyze this data from your research. >= 50 words) |