Feature Rich Indonesian RTE

This is Python implementation for the work that published on ACLing 2021 with paper entitled, "Feature-Rich Classifiers for Recognizing Textual Entailment in Indonesian". If you find this helpful, please cite to paper.

Recognizing Textual Entailment (RTE) is a task in Natural Language Processing that can be used to determine the entailment of a sentence from another sentence. In this work we extract 35 features from a pair of text (T) and hypothesis (H) in Indonesian. We aim to solve the RTE task in Indonesian using a feature-rich classifier.

Dataset

The dataset used in this paper is WRETE Dataset that can be accessed here. We only used the train data for the training process and test data to test the model.

Resource

Word2vec Model: idwiki_word2vec_300.model

Pretrained Model POS Tagger: CRF Tagger -- read more

Environment

This work was implemented on Google Colaboratory

Sastrawi

pip install sastrawi -- read more

NLTK

pip install nltk -- read more

sklearn-crfsuite

pip install sklearn_crfsuite -- read_more

Results

After performing an ablation study, the best performance in this work is obtained when using SVM with an F1-Score of 79.65%

How To Run

IPython Notebook/preprocessing.py is the code implemented to preprocess both the train and test data.
IPython Notebook/model_processing.py is the code implemented to train the model using all extracted data in the preprocessing step. The testing scenario also implemented in this code.

For further information, please email me at: rani.auila@ui.ac.id or raniaulia72@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
IPython Notebook		IPython Notebook
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Feature Rich Indonesian RTE

Dataset

Resource

Environment

Sastrawi

NLTK

sklearn-crfsuite

Results

How To Run

About

Releases

Packages

Languages

raniauliah/FeatureRichIndonesianRTE

Folders and files

Latest commit

History

Repository files navigation

Feature Rich Indonesian RTE

Dataset

Resource

Environment

Sastrawi

NLTK

sklearn-crfsuite

Results

How To Run

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages