This repository contains an LSTM model implemented by PyTorch to perform sentiment classification on the Stanford Sentiment Treebank (SST-5) dataset. We train the model with/without pretrained embeddings and conduct several experiments on different hyperparameters. Since the SST-5 dataset contains sentiment labels for each token in sentences, we develop a modified model to utilze this information.
We use the 300-dim GloVe embeddings from 6B tokens and provide a report to introduce the implementation details and evaluation results.
The required environments are as follows.
- torch-1.11.0
- torchtext-0.12.0
- pytreebank-0.2.7: used to load the datasets.
The structure of our projects is as follows.
- sentiment analysis: basic version.
- /codes: contain all codes.
- test.py: the entrance of our project.
- train.py: define the class to train the model.
- model.py: define the model.
- utils.py: load the data for training and evaluations.
- config.yaml: store the configurations for model trianing.
- /weight: the directory to save models, training logs and results.
- /data: the directory of the datasets and pretrained embeddings.
- /codes: contain all codes.
- improved: improved version.
- /codes: contain all codes.
- test_improved.py: the entrance of the improved model.
- train_improved.py: define the class to train the model.
- model_improved.py: define the improved model.
- utils_improved.py: load the data for training and evaluations.
- config.yaml: store the configurations for model trianing.
- /weight: the directory to save models and training logs.
- /data: the directory of corpus used in training and evaluation.
- /codes: contain all codes.
- report.pdf: a brief introduction of our implementation details, the improved model and evaluation results.
First download the Stanford Sentiment Treebank (SST-5) dataset and the pretrained embeddings into the /data directory. Then unzip them.
After that, you can use the following command to run our codes in the /codes directory.
python test.py --config=config.yaml
The meaning of each configuration can be found in our report.