Skip to content
This repository was archived by the owner on Jul 31, 2025. It is now read-only.

winterdrive/binary-semantic-segmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Binary Semantic Segmentation with U-Net

A PyTorch implementation of U-Net for binary semantic segmentation on the Oxford-IIIT Pet Dataset. This project demonstrates end-to-end training and evaluation of deep learning models for computer vision tasks.

Overview

This project implements a U-Net architecture for binary semantic segmentation, specifically designed to segment pets (cats and dogs) from background in images. The model learns to generate pixel-wise binary masks that distinguish between foreground (pet) and background regions.

Key Features

  • U-Net Architecture: Classic encoder-decoder network with skip connections
  • Binary Segmentation: Optimized for foreground/background classification
  • Oxford-IIIT Pet Dataset: Automatic dataset download and preprocessing
  • Training Pipeline: Complete training loop with validation and logging
  • Evaluation Metrics: IoU, accuracy, and visual result comparison
  • Model Checkpointing: Save and load trained models

Technical Architecture

Model Architecture

  • Network: U-Net with encoder-decoder structure
  • Input: RGB images (3 channels, 256×256 pixels)
  • Output: Binary masks (1 channel, 256×256 pixels)
  • Loss Function: Binary Cross-Entropy with Logits
  • Optimizer: Adam with learning rate scheduling

Dependencies

  • PyTorch >= 1.9.0
  • torchvision >= 0.10.0
  • numpy >= 1.21.0
  • Pillow >= 8.3.0
  • tqdm >= 4.62.0
  • matplotlib >= 3.4.0

Installation

Prerequisites

  • Python 3.7 or higher
  • CUDA-compatible GPU (recommended) or CPU

Setup Instructions

  1. Clone the repository

    git clone <repository-url>
    cd binary-semantic-segmentation
  2. Install dependencies

    pip install -r requirements.txt
  3. Download dataset

    python -c "from src.dataset import OxfordPetDataset; OxfordPetDataset.download('./data/oxford-iiit-pet')"

Usage

Training

Train a U-Net model on the Oxford-IIIT Pet dataset:

python src/train.py --model_type unet --data_path ./data/oxford-iiit-pet --epochs 50 --batch_size 8 --learning_rate 1e-4

Train a ResNet34-UNet model:

python src/train.py --model_type resnet34_unet --data_path ./data/oxford-iiit-pet --epochs 50 --batch_size 8 --learning_rate 1e-4

Training Arguments

  • --model_type: Type of model to train (unet or resnet34_unet)
  • --data_path: Path to dataset directory (default: ./data/oxford-iiit-pet)
  • --save_path: Directory to save trained models (default: ./saved_models)
  • --epochs: Number of training epochs (default: 50)
  • --batch_size: Batch size for training (default: 8)
  • --learning_rate: Learning rate for optimizer (default: 1e-4)

Evaluation

Evaluate a trained U-Net model:

python src/evaluate.py --model_path ./saved_models/unet_best_model.pth --model_type unet --data_path ./data/oxford-iiit-pet --save_visualizations

Evaluate a trained ResNet34-UNet model:

python src/evaluate.py --model_path ./saved_models/resnet34_unet_best_model.pth --model_type resnet34_unet --data_path ./data/oxford-iiit-pet --save_visualizations

Inference

Run inference with a trained U-Net model:

python src/inference.py --model ./saved_models/unet_best_model.pth --model_type unet --data_path ./data/oxford-iiit-pet --save_results

Run inference with a trained ResNet34-UNet model:

python src/inference.py --model ./saved_models/resnet34_unet_best_model.pth --model_type resnet34_unet --data_path ./data/oxford-iiit-pet --save_results

Single Image Demo

Run inference on a single image:

python src/inference_demo.py --model_path ./saved_models/unet_best_model.pth --model_type unet --image_path ./demo/sample.jpg --output_path ./results/demo_result.png

Quick Demo

Run the complete demo script:

cd demo && chmod +x demo.sh && ./demo.sh

Project Structure

binary-semantic-segmentation/
├── src/
│   ├── models/
│   │   ├── __init__.py
│   │   ├── unet.py           # U-Net model implementation
│   │   └── resnet34_unet.py  # ResNet34-UNet model implementation
│   ├── oxford_pet.py         # Oxford-IIIT Pet dataset loading and preprocessing
│   ├── train.py              # Training script
│   ├── evaluate.py           # Evaluation script
│   ├── inference.py          # Inference script
│   └── utils.py              # Utility functions
├── demo/                     # Demo scripts and examples
├── data/                     # Dataset directory (created automatically)
├── requirements.txt          # Python dependencies
├── TECHNICAL_REPORT.md       # Detailed technical implementation report
└── README.md                # This file

Key Files

  • src/models/unet.py: U-Net architecture implementation with encoder-decoder structure
  • src/models/resnet34_unet.py: ResNet34-UNet hybrid architecture implementation
  • src/oxford_pet.py: Oxford-IIIT Pet dataset class with automatic download and preprocessing
  • src/train.py: Complete training pipeline with validation and checkpointing
  • src/evaluate.py: Model evaluation with metrics calculation and visualization
  • src/utils.py: Helper functions for device selection, logging, and visualization
  • TECHNICAL_REPORT.md: Comprehensive technical report with implementation details and experimental results

Training Process

The training process includes:

  1. Data Loading: Automatic dataset download and train/validation split (90%/10%)
  2. Preprocessing: Image resizing to 256×256 and trimap conversion to binary masks
  3. Training Loop: Forward pass, loss calculation, backpropagation with gradient clipping
  4. Validation: Periodic evaluation on validation set with IoU and accuracy metrics
  5. Checkpointing: Automatic saving of best models and periodic checkpoints
  6. Learning Rate Scheduling: Adaptive learning rate reduction based on validation performance

Expected Results

Performance Metrics

  • Training Accuracy: ~95%+ on training set
  • Validation Accuracy: ~90%+ on validation set
  • IoU Score: ~0.8+ for well-segmented images
  • Convergence: Typically converges within 30-50 epochs

Model Performance

  • Inference Speed: ~50-100ms per image on GPU
  • Model Size: ~31M parameters (~120MB file size)
  • Memory Usage: ~2-4GB GPU memory during training

For detailed experimental results, training insights, and comprehensive technical analysis, see TECHNICAL_REPORT.md.

Dataset Information

The Oxford-IIIT Pet Dataset contains:

  • 37 pet categories (cats and dogs)
  • ~7,400 images total
  • Trimap annotations with pixel-level labels:
    • Class 1: Foreground (pet)
    • Class 2: Background
    • Class 3: Boundary/uncertain regions

The dataset is automatically downloaded and processed into binary masks suitable for semantic segmentation.

References

License

This project is open source and available under the MIT License.

About

TAICA 2025 Deep Learning (11320AIA500600) - lab2, revised for open source

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages