Application Link: https://huggingface.co/faizack/Fine_Tune_Roberta-base-sentiment
This repository illustrates how to fine-tune the CardiffNLP Twitter RoBERTa Base model for sentiment analysis using your own dataset. The pre-trained model employed here is based on the RoBERTa architecture and has been specialized in sentiment analysis tasks using Twitter data.
Before proceeding, ensure you have the following dependencies installed:
- Python 3.12
Install Required Packages
pip install -r requirements.txt
Run analysis.ipynb
-
Dataset Acquisition: Obtain the Amazon Reviews dataset from Kaggle API and load in .env as shown .env.example.
-
Data Extraction and Preparation: Extract the compressed data files (
train.ft.txt.bz2
andtest.ft.txt.bz2
) and preprocess them into readable text files (train.ft.txt
andtest.ft.txt
). -
Data Loading: Load the preprocessed data into Pandas DataFrames for further processing.
-
Tokenization: Use the
AutoTokenizer
from Hugging Face's Transformers library to tokenize the text data. -
Model Initialization: Initialize the pre-trained sentiment analysis model (
AutoModelForSequenceClassification
) from the CardiffNLP Twitter RoBERTa Base. -
Training: Fine-tune the initialized model on the preprocessed dataset. Apply appropriate training parameters and evaluation metrics.
-
Model Saving: Save the fine-tuned model and tokenizer for future usage.
-
Model Loading: Load the saved fine-tuned model and tokenizer.
-
Prediction: Tokenize the input text and perform sentiment analysis using the loaded model.
-
Result Interpretation: Interpret the model's output to determine the sentiment label and confidence score.
For running the Flask API and the Streamlit app, follow these simplified steps:
Note : Before Running the Flask API make sure you run all step in Analysis
-
Flask API:
-
Start the Flask API:
cd server/;flask --app api.py run
-
-
Streamlit App:
-
Run the Streamlit app:
streamlit run app.py
-