Saltik is a dataset for benchmarking non-word error correction method accuracy in evaluating Indonesian words. It consists of 58,532 non-word errors generated from 3,000 of the most popular Indonesian words.
No split.
- 2023-09-01 v1.0
- Initial dataset
- SALTIK v1.0 was built by Hanif Arkan Audah for his undergraduate thesis at Faculty of Computer Science, Universitas Indonesia in 2023.
Please cite the following paper if you use this dataset for your project/publication (status: accepted)
@inproceedings{audah2023,
author = {Audah, Hanif Arkan and Yuliawati, Arlisa and Alfina, Ika},
booktitle = "Proceedings of the ICAICTA 2023",
month = "October",
year = "2023",
address = "Lombok, Indonesia",
publisher = "IEEE",
keywords = {spell checker,non-word error,isolated-word error correction,symspell,edit distance,damerau-levenshtein},
title = {{A Comparison Between SymSpell and a Combination of Damerau-Levenshtein Distance With the Trie Data Structure}},
year = {2023}
}
You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.
ika.alfina [at] cs.ui.ac.id