A UD dataset for Sindhi, based on newswire (primarily Kawish), folk stories from the Adabi forums, handwritten text to demonstrate linguistic features, and a reparsing of the unfinished MazharDootio dataset.
Data in this treebank is split into three sections:
- Test section: some Kawish articles and folk stories. The reparsing of the MazharDootio dataset will also go here
- Dev section: another set of Kawish articles and folk stories
- Train section: everything else
Annotation done by:
- Mutee U Rahman
- Sarwat Qureshi
- Shafi Pirzada
- Sakeena Shah
- Muhammad Shaheer
- Mir Afzal Ahmed Talpur
- Zubair Sanjrani
- John Bauer
Publication out for review.
- 2024-05-15 v2.16
- Initial release in Universal Dependencies.
=== Machine-readable metadata (DO NOT REMOVE!) ================================ Data available since: UD v2.16 License: CC BY-SA 4.0 Includes text: yes Genre: grammar-examples Lemmas: manual native UPOS: manual native XPOS: manual native Features: manual native Relations: manual native Contributors: Rahman, Mutee-u Contributing: here Contact: muteeurahman@gmail.com ===============================================================================