This project is focused on classifying movie reviews as positive or negative. The classification is based on a score ranging between 0 and 1, where the closer the score is to 1, the more positive the review is, and the closer it is to 0, the more negative the review is. The model's performance is evaluated using Accuracy and F1 Score metrics.
- Project Overview
- Model
- Data Preprocessing
- Performance Metrics
- How to Use
- Dependencies
- Contributing
- License
The goal of this project is to classify movie reviews into positive or negative sentiment. The sentiment is determined by a score between 0 and 1:
- 0 to 0.5: Negative sentiment
- 0.5 to 1: Positive sentiment
The project is implemented in Python using TensorFlow, with BERT as the underlying model architecture. The BERT model is fetched from TensorFlow Hub and is applied in two scenarios:
- On processed data.
- On raw (unprocessed) data.
The model used in this project is based on BERT (Bidirectional Encoder Representations from Transformers). BERT is a pre-trained model provided by TensorFlow Hub.
-
Preprocessing Layer:
bert_preprocess = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3")
-
Encoder Layer:
bert_encoder = hub.KerasLayer("https://tfhub.dev/tensorflow/bert_en_uncased_L-12_H-768_A-12/4")
The data preprocessing includes:
- Tokenization
- Padding/Truncating sequences to a uniform length
- Applying the BERT preprocessing layer
The model is trained once on processed data and once without any preprocessing to compare the impact of preprocessing on model performance.
The model's performance is evaluated using the following metrics:
- Accuracy Score: Measures the percentage of correctly classified reviews.
- F1 Score: A weighted average of precision and recall, especially useful for imbalanced datasets.
-
Download the
.ipynb
files from this repository. -
Upload the files to Google Colab or Jupyter Notebook.
-
Run the notebooks in Colab or Jupyter to train the model and predict sentiments.
The dependencies are managed within the Colab or Jupyter Notebook environment, so no additional installation steps are needed.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.