Machine learning models for schwa deletion in Hindi and Punjabi.
Pre-generated models, which achieve state-of-the-art performance, using scikit-learn's MLPClassifier
and LogisticRegression
, as well as XGBoost's XGBClassifier
are included in the models
subfolder in each language's directory.
The results of this research are presented in the paper below:
"Supervised Grapheme-to-Phoneme Conversion of Orthographic Schwas in Hindi and Punjabi", Aryaman Arora, Luke Gessler, and Nathan Schneider (2020). In Proceedings of ACL. Preprint: https://arxiv.org/abs/2004.10353
Ensure that you are using the most recent Python 3 version.
Clone repo and install requirements:
git clone https://github.com/aryamanarora/schwa-deletion.git
cd schwa-deletion
pip install -r requirements.txt
Testing the pretrained Hindi XGBoost model:
cd hindi
python test.py
You can see test.py
for an idea of how to use the main.py
script as a module.