IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

This is the dataset repository of IndoToD, presented at SEALP 2023, colocated with AACL 2023, where our paper was awarded with the Best Paper 🏆 [ACL Anthology].

This code has been written using PyTorch. If you use source codes or datasets included in this repository in your work, please cite the following paper:

@inproceedings{kautsar2023indotod,
  title={IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems},
  author={Kautsar, Muhammad and Nurdini, Rahmah and Cahyawijaya, Samuel and Winata, Genta and Purwarianti, Ayu},
  booktitle={Proceedings of the First Workshop in South East Asian Language Processing},
  pages={85--99},
  year={2023}
}

Summary

We introduce IndoToD, a high-quality bilingual multi-domain task-oriented dialogue system data for Indonesian and English. It comprises two datasets:

Overall, it has four different domains by delexicalization to efficiently reduce the size of annotations. To ensure a high-quality data collection, we hire native speakers to manually annotate the dialogues. We annotated the data from existing English ToD datasets: CamRest and SMD. Along with the original English datasets, these new Indonesian datasets serve as an effective benchmark for evaluating Indonesian and English ToD systems as well as exploring the potential benefits of cross-lingual and bilingual transfer learning approaches.

IndoCamRest

IndoCamRest is a task-oriented dialogue system dataset that translated from Cambridge Restaurant 676 (CamRest) dataset.

IndoSMD

IndoSMD is a task-oriented dialogue system dataset that translated from In-Car Assistant (SMD) dataset.

Results

We set up a benchmark for both Indonesian and English ToD to evaluate the performance of the current ToD systems in monolingual, cross-lingual, and bilingual tasks.

Indonesian test set

English test set

License

The datasets are under CC-BY-SA 4.0 and the code is license under Apache 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
IndoCamRest		IndoCamRest
IndoSMD		IndoSMD
imgs		imgs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

Summary

IndoCamRest

IndoSMD

Results

Indonesian test set

English test set

License

About

Releases

Packages

Contributors 2

License

dehanalkautsar/IndoToD

Folders and files

Latest commit

History

Repository files navigation

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

Summary

IndoCamRest

IndoSMD

Results

Indonesian test set

English test set

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages