SALTIK: An Indonesian Non-Word Error Spelling Correction Dataset

Summary

Saltik is a dataset for benchmarking non-word error correction method accuracy in evaluating Indonesian words. It consists of 58,532 non-word errors generated from 3,000 of the most popular Indonesian words.

Dataset Split

No split.

Changelog

2023-09-01 v1.0
- Initial dataset

Acknowledgments

SALTIK v1.0 was built by Hanif Arkan Audah for his undergraduate thesis at Faculty of Computer Science, Universitas Indonesia in 2023.

References

Please cite the following paper if you use this dataset for your project/publication (status: accepted)

@inproceedings{audah2023,
author = {Audah, Hanif Arkan and Yuliawati, Arlisa and Alfina, Ika},
booktitle = "Proceedings of the ICAICTA 2023",
month = "October",
year = "2023",
address = "Lombok, Indonesia",
publisher = "IEEE",
keywords = {spell checker,non-word error,isolated-word error correction,symspell,edit distance,damerau-levenshtein},
title = {{A Comparison Between SymSpell and a Combination of Damerau-Levenshtein Distance With the Trie Data Structure}},
year = {2023}
}

Licence

You can use this dataset for free. You don't need our permission to use it. Please cite our paper if your work uses our data in your publication. Please note that you are not allowed to create a copy of this dataset and share it publicly in your own repository without our permission.

Contact

ika.alfina [at] cs.ui.ac.id

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
saltik.json		saltik.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SALTIK: An Indonesian Non-Word Error Spelling Correction Dataset

Summary

Dataset Split

Changelog

Acknowledgments

References

Licence

Contact

About

Releases 1

Packages

Contributors 2

License

ir-nlp-csui/saltik

Folders and files

Latest commit

History

Repository files navigation

SALTIK: An Indonesian Non-Word Error Spelling Correction Dataset

Summary

Dataset Split

Changelog

Acknowledgments

References

Licence

Contact

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Packages