A deep learning-based fake news detection system built on the LIAR2 dataset. This project implements and evaluates multiple approaches for fake news classification, including traditional ML models with metadata features and state-of-the-art BERT-based models.
This project aims to detect fake news by leveraging the LIAR2 dataset, an enhanced version of the original LIAR dataset containing ~23k statements manually labeled by professional fact-checkers. The system classifies statements into six veracity categories:
- 👎 pants-fire: Completely false statements
- ❌ false: False statements
- 🤏 barely-true: Mostly false statements with a small element of truth
- ⚖️ half-true: Statements with mixed accuracy
- ✅ mostly-true: Mostly accurate statements with minor issues
- 💯 true: Completely accurate statements
- Data Processing Pipeline: Clean and prepare the LIAR2 dataset for machine learning
- Metadata-based Models: Neural networks leveraging statement metadata (speaker info, context, etc.)
- BERT-based Models: Fine-tuned BERT models for enhanced fake news classification
- GPU Acceleration: Full GPU support for model training and inference
- Evaluation Framework: Comprehensive metrics for model performance analysis
liar2-fake-news/
├── data/ # Dataset files
│ └── liar2/ # Original LIAR2 dataset
├── logs/ # Training logs
├── models/ # Saved models
│ ├── metadata_gpu_model.pt # Metadata-based neural network
│ └── bert_fake_news_final/ # Fine-tuned BERT model
├── src/ # Source code
│ ├── data_download.py # Dataset download script
│ ├── data_prep.py # Data preprocessing
│ ├── train_meta_baseline.py # Metadata-based model training
│ ├── train_bert.py # BERT model training
│ └── test_bert.py # BERT model testing script
└── requirements.txt # Python dependencies
-
Clone the repository:
git clone https://github.com/madboy482/FakeNewsDetection.git cd FakeNewsDetection -
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt -
Download and prepare the dataset:
python src/data_download.py python src/data_prep.py
Train the metadata-based neural network model:
python src/train_meta_baseline.py
Train the BERT-based model (requires GPU):
python src/train_bert.py
Test the BERT model with sample statements:
python src/test_bert.py
Our best BERT-based model achieves significant improvements over baselines:
| Model | Accuracy | Macro F1 |
|---|---|---|
| Metadata NN | 51.19% | 0.48 |
| BERT | ~65% | ~0.63 |
- Implement ensemble methods combining metadata and text-based models
- Experiment with other transformer architectures (RoBERTa, DeBERTa)
- Add explainability components to highlight statement elements that indicate falsehood
- Incorporate external knowledge sources for fact verification
If you use the LIAR2 dataset, please cite the original paper:
@article{cheng2024enhanced,
title={An Enhanced Fake News Detection System With Fuzzy Deep Learning},
author={Cheng, Xu and Liu, Weiwei and Wang, Yue and Tang, Bo and He, Yingchun},
journal={IEEE Access},
year={2024},
publisher={IEEE}
}
This project is licensed under the MIT License - see the LICENSE file for details.