Binary classification of memes as Political or NonPolitical for Bangladesh/India context using Vision-Language Models with OCR correction and LoRA fine-tuning.
Challenge: PoliMemeDecode - CUET CSE Fest Datathon
Task: Classify multi-lingual (Bengali/English) memes with 30:70 class imbalance
Metric: Macro F1 Score
Approach: 3-stage pipeline - OCR → VLM Correction → LoRA Fine-tuning → Ensemble
- Train: 2,860 images (853 Political, 2,007 NonPolitical)
- Test: 330 images
- Languages: Bengali, English, mixed
- Format: Image memes with embedded text
Tool: EasyOCR (Bengali + English)
- Extracts text from meme images
- Threshold: 0.25
- Supports multilingual detection
Notebooks:
Model: Qwen3-VL-4B-Instruct (4-bit quantized)
- Corrects OCR errors and recovers missing text
- Impact: +5.4 avg chars, 0 missing samples
- Batch processing for train/test splits
Notebooks:
- 2.1_qwen3vl-text-correction_train_0-999.ipynb
- 2.2_qwen3vl-text-correction_train_1000-1999.ipynb
- 2.3_qwen3vl-text-correction_train_2000-2859.ipynb
- 2.4_qwen3vl-text-correction_test_0-329.ipynb
Base Model: Qwen3-VL-4B-Instruct (4-bit BNB)
- Method: LoRA (r=16, α=16, dropout=0) - 0.5% trainable params
- Framework: Unsloth (optimized training)
- Strategy: 5-fold stratified cross-validation
- Training: Batch=32 (4×8 grad accum), LR=5e-4, 3 epochs
- Time: ~3-4 hours per fold
Notebooks:
- 3.1_finetune-qwen3vl-selective-folds FOLD 1.ipynb
- 3.2_finetune-qwen3vl-selective-folds FOLD 2.ipynb
- 3.3_finetune-qwen3vl-selective-folds FOLD 3.ipynb
- 3.4_finetune-qwen3vl-selective-folds FOLD 4.ipynb
- 3.5_finetune-qwen3vl-selective-folds FOLD 5.ipynb
- Majority voting across 5 folds
- Comprehensive metrics (F1, accuracy, precision, recall)
- Final submission generation
Notebooks:
poli-meme-decode/
├── 0_dataset_analysis.ipynb # EDA and dataset statistics
├── 1_EasyOCR_text_extraction.ipynb # Stage 1: OCR extraction
├── 2.x_qwen3vl-text-correction_*.ipynb # Stage 2: VLM correction (4 parts)
├── 3.x_finetune-qwen3vl-*.ipynb # Stage 3: LoRA training (5 folds)
├── 4_analysis_and_ensemble.ipynb # Stage 4: Ensemble & metrics
├── 5_generate_report_visualizations.ipynb # Report visualizations
├── Report.md # Detailed technical report
└── outputs/
├── extracted_text_train_multilang.csv # OCR outputs
├── extracted_text_test_multilang.csv
├── train_full.csv # Corrected text
├── test_full.csv
├── fold_*_test_predictions.csv # Per-fold predictions
└── submissions.csv # Final ensemble predictions
pip install torch transformers accelerate bitsandbytes
pip install unsloth trl peft
pip install easyocr pillow pandas numpy scikit-learn
pip install tqdm matplotlib seaborn-
Dataset Analysis
jupyter notebook 0_dataset_analysis.ipynb
-
OCR Extraction
jupyter notebook 1_EasyOCR_text_extraction.ipynb
-
Text Correction (parallel processing recommended)
# Run all 2.x notebooks in parallel or sequence -
LoRA Fine-tuning (parallel across machines)
# Configure SELECTED_FOLDS in each 3.x notebook # Run on separate machines/GPUs for efficiency
-
Ensemble & Submission
jupyter notebook 4_analysis_and_ensemble.ipynb
- Multi-stage Pipeline: OCR → VLM → Fine-tuning → Ensemble
- Multilingual Support: Bengali and English text handling
- Memory Efficient: 4-bit quantization, LoRA parameter efficiency
- Cross-validation: 5-fold stratified CV for robust predictions
- Parallel Processing: Notebooks designed for distributed execution
- Cross-validation Strategy: 5-fold stratified (seed=42)
- Ensemble Method: Majority voting
- Metric: Macro F1 Score
- Full results available in Report_Formula_One.pdf
- Md. Nayeem (Team Lead) - KUET
- Shakhoyat Rahman Shujon - KUET
- MD. Jahid Hasan Jim - KUET
This project was developed for the PoliMemeDecode Challenge at CUET CSE Fest 2025.
- Base Model: Qwen3-VL-4B-Instruct
- Framework: Unsloth
- OCR: EasyOCR