Skip to content

codernayeem/poli-meme-decode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PoliMemeDecode: Political Meme Classification

Binary classification of memes as Political or NonPolitical for Bangladesh/India context using Vision-Language Models with OCR correction and LoRA fine-tuning.

🎯 Project Overview

Challenge: PoliMemeDecode - CUET CSE Fest Datathon
Task: Classify multi-lingual (Bengali/English) memes with 30:70 class imbalance
Metric: Macro F1 Score
Approach: 3-stage pipeline - OCR → VLM Correction → LoRA Fine-tuning → Ensemble

📊 Dataset

  • Train: 2,860 images (853 Political, 2,007 NonPolitical)
  • Test: 330 images
  • Languages: Bengali, English, mixed
  • Format: Image memes with embedded text

🔧 Pipeline

Stage 1: Text Extraction

Tool: EasyOCR (Bengali + English)

  • Extracts text from meme images
  • Threshold: 0.25
  • Supports multilingual detection

Notebooks:

Stage 2: Text Correction with VLM

Model: Qwen3-VL-4B-Instruct (4-bit quantized)

  • Corrects OCR errors and recovers missing text
  • Impact: +5.4 avg chars, 0 missing samples
  • Batch processing for train/test splits

Notebooks:

Stage 3: LoRA Fine-tuning

Base Model: Qwen3-VL-4B-Instruct (4-bit BNB)

  • Method: LoRA (r=16, α=16, dropout=0) - 0.5% trainable params
  • Framework: Unsloth (optimized training)
  • Strategy: 5-fold stratified cross-validation
  • Training: Batch=32 (4×8 grad accum), LR=5e-4, 3 epochs
  • Time: ~3-4 hours per fold

Notebooks:

Stage 4: Ensemble & Analysis

  • Majority voting across 5 folds
  • Comprehensive metrics (F1, accuracy, precision, recall)
  • Final submission generation

Notebooks:

📁 Repository Structure

poli-meme-decode/
├── 0_dataset_analysis.ipynb           # EDA and dataset statistics
├── 1_EasyOCR_text_extraction.ipynb    # Stage 1: OCR extraction
├── 2.x_qwen3vl-text-correction_*.ipynb # Stage 2: VLM correction (4 parts)
├── 3.x_finetune-qwen3vl-*.ipynb       # Stage 3: LoRA training (5 folds)
├── 4_analysis_and_ensemble.ipynb      # Stage 4: Ensemble & metrics
├── 5_generate_report_visualizations.ipynb # Report visualizations
├── Report.md                          # Detailed technical report
└── outputs/
    ├── extracted_text_train_multilang.csv    # OCR outputs
    ├── extracted_text_test_multilang.csv
    ├── train_full.csv                        # Corrected text
    ├── test_full.csv
    ├── fold_*_test_predictions.csv           # Per-fold predictions
    └── submissions.csv                       # Final ensemble predictions

🚀 Quick Start

Requirements

pip install torch transformers accelerate bitsandbytes
pip install unsloth trl peft
pip install easyocr pillow pandas numpy scikit-learn
pip install tqdm matplotlib seaborn

Workflow

  1. Dataset Analysis

    jupyter notebook 0_dataset_analysis.ipynb
  2. OCR Extraction

    jupyter notebook 1_EasyOCR_text_extraction.ipynb
  3. Text Correction (parallel processing recommended)

    # Run all 2.x notebooks in parallel or sequence
  4. LoRA Fine-tuning (parallel across machines)

    # Configure SELECTED_FOLDS in each 3.x notebook
    # Run on separate machines/GPUs for efficiency
  5. Ensemble & Submission

    jupyter notebook 4_analysis_and_ensemble.ipynb

💡 Key Features

  • Multi-stage Pipeline: OCR → VLM → Fine-tuning → Ensemble
  • Multilingual Support: Bengali and English text handling
  • Memory Efficient: 4-bit quantization, LoRA parameter efficiency
  • Cross-validation: 5-fold stratified CV for robust predictions
  • Parallel Processing: Notebooks designed for distributed execution

📈 Results

  • Cross-validation Strategy: 5-fold stratified (seed=42)
  • Ensemble Method: Majority voting
  • Metric: Macro F1 Score
  • Full results available in Report_Formula_One.pdf

👥 Team: Formula One

  1. Md. Nayeem (Team Lead) - KUET
  2. Shakhoyat Rahman Shujon - KUET
  3. MD. Jahid Hasan Jim - KUET

📄 License

This project was developed for the PoliMemeDecode Challenge at CUET CSE Fest 2025.

🔗 Resources

About

Datathon | CUET CSE FEST 2025. Classify political/Non-political Mems

Topics

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •