This project combines traditional machine learning and advanced deep learning techniques to accurately detect sarcasm in text. By leveraging both TF-IDF-based Random Forest models and the power of BERT, it ensures robust performance across various datasets.
- Custom
TextPreprocessor
Class: Handles text cleaning, lemmatization, and tokenization. - Stopword Removal: Removes common words that don’t contribute to meaning.
- Special Character Handling: Filters out unnecessary symbols and punctuations.
- Traditional ML Model: Uses TF-IDF features and a Random Forest classifier for sarcasm detection.
- Deep Learning Model: Implements a fine-tuned BERT model for contextual understanding.
- Ensemble Prediction: Combines predictions from both models for improved accuracy.
- Pre-Trained Model: Uses BERT for advanced feature extraction.
- Custom Dataset Class: Prepares input for BERT with attention masks and tokenization.
- Context Understanding: Captures nuanced meanings for accurate sarcasm detection.
- TF-IDF Representation: Replaces basic CountVectorizer for better feature weighting.
- N-Gram Support: Includes uni-grams, bi-grams, and tri-grams for richer features.
- BERT Word Embeddings: Utilizes BERT's contextual embeddings for deep learning.
- Train/Validation Split: Ensures reliable performance evaluation.
- Cross-Validation: Supports robust model tuning.
- Learning Rate Optimization: Implements dynamic learning rates for faster convergence.
- Dropout Regularization: Prevents overfitting in deep learning models.
- Device Support: Compatible with both CPU and GPU for training and inference.
- Comprehensive Evaluation Metrics: Includes accuracy, precision, recall, F1-score, and more.
- Flexible Predictions: Choose between traditional ML, BERT, or ensemble methods.
- Python 3.12+
- Jupyter Notebook
- Libraries:
- TensorFlow / PyTorch
- NLTK
- Transformers
- Scikit-Learn
- re
- Matplotlib
- Collections
- Numpy
- Pandas
-
Clone the repository
git clone https://github.com/ajitashwathr10/Sarcasm-Detection.git cd Sarcasm-Detection
-
Set up a virtual environment (Optional)
python -m venv venv source venv/bin/activate # On Linux/Mac venv\Scripts\activate # On Windows
-
Install Dependencies
pip install -r requirements.txt
-
Download Pre-Trained BERT Model
- This project uses the Hugging Face Transformers library. Ensure you have internet access when running the model to automatically download the pre-trained BERT weights.
-
Verify Installation
python test_installation.py
- For GPU support, ensure you have CUDA installed and install the GPU-compatible version of TensorFlow or PyTorch
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
pip install tensorflow-gpu
Now you're ready to start detecting sarcasm!
- Preprocess Data
- Prepare your dataset using the
TextPreprocessor
class to clean and tokenize text.
- Prepare your dataset using the
- Train Models
- Train the traditional and deep learning models using the provided training pipelines.
- Make Predictions
- Use the ensemble or individual models to predict sarcasm in new textual data.
- Example:
from sarcasm_detector import SarcasmDetector detector = SarcasmDetector() detector.train("path/to/dataset.csv") predictions = detector.predict(["This is such a great day!", "Oh, what a surprise..."]) print(predictions)
We welcome contributions to enhance this project. Feel free to submit pull requests or raise issues!
This project is licensed under the MIT License. See the LICENSE file for details.