This is Python implementation for the work that published on ACLing 2021 with paper entitled, "Feature-Rich Classifiers for Recognizing Textual Entailment in Indonesian". If you find this helpful, please cite to paper.
Recognizing Textual Entailment (RTE) is a task in Natural Language Processing that can be used to determine the entailment of a sentence from another sentence. In this work we extract 35 features from a pair of text (T) and hypothesis (H) in Indonesian. We aim to solve the RTE task in Indonesian using a feature-rich classifier.
The dataset used in this paper is WRETE Dataset that can be accessed here. We only used the train data for the training process and test data to test the model.
Word2vec Model: idwiki_word2vec_300.model
Pretrained Model POS Tagger: CRF Tagger -- read more
This work was implemented on Google Colaboratory
pip install sastrawi
-- read more
pip install nltk
-- read more
pip install sklearn_crfsuite
-- read_more
After performing an ablation study, the best performance in this work is obtained when using SVM with an F1-Score of 79.65%
IPython Notebook/preprocessing.py
is the code implemented to preprocess both the train and test data.IPython Notebook/model_processing.py
is the code implemented to train the model using all extracted data in the preprocessing step. The testing scenario also implemented in this code.
For further information, please email me at: rani.auila@ui.ac.id or raniaulia72@gmail.com