A PyTorch implementation of U-Net for binary semantic segmentation on the Oxford-IIIT Pet Dataset. This project demonstrates end-to-end training and evaluation of deep learning models for computer vision tasks.
This project implements a U-Net architecture for binary semantic segmentation, specifically designed to segment pets (cats and dogs) from background in images. The model learns to generate pixel-wise binary masks that distinguish between foreground (pet) and background regions.
- U-Net Architecture: Classic encoder-decoder network with skip connections
- Binary Segmentation: Optimized for foreground/background classification
- Oxford-IIIT Pet Dataset: Automatic dataset download and preprocessing
- Training Pipeline: Complete training loop with validation and logging
- Evaluation Metrics: IoU, accuracy, and visual result comparison
- Model Checkpointing: Save and load trained models
- Network: U-Net with encoder-decoder structure
- Input: RGB images (3 channels, 256×256 pixels)
- Output: Binary masks (1 channel, 256×256 pixels)
- Loss Function: Binary Cross-Entropy with Logits
- Optimizer: Adam with learning rate scheduling
- PyTorch >= 1.9.0
- torchvision >= 0.10.0
- numpy >= 1.21.0
- Pillow >= 8.3.0
- tqdm >= 4.62.0
- matplotlib >= 3.4.0
- Python 3.7 or higher
- CUDA-compatible GPU (recommended) or CPU
-
Clone the repository
git clone <repository-url> cd binary-semantic-segmentation
-
Install dependencies
pip install -r requirements.txt
-
Download dataset
python -c "from src.dataset import OxfordPetDataset; OxfordPetDataset.download('./data/oxford-iiit-pet')"
Train a U-Net model on the Oxford-IIIT Pet dataset:
python src/train.py --model_type unet --data_path ./data/oxford-iiit-pet --epochs 50 --batch_size 8 --learning_rate 1e-4
Train a ResNet34-UNet model:
python src/train.py --model_type resnet34_unet --data_path ./data/oxford-iiit-pet --epochs 50 --batch_size 8 --learning_rate 1e-4
--model_type
: Type of model to train (unet
orresnet34_unet
)--data_path
: Path to dataset directory (default:./data/oxford-iiit-pet
)--save_path
: Directory to save trained models (default:./saved_models
)--epochs
: Number of training epochs (default: 50)--batch_size
: Batch size for training (default: 8)--learning_rate
: Learning rate for optimizer (default: 1e-4)
Evaluate a trained U-Net model:
python src/evaluate.py --model_path ./saved_models/unet_best_model.pth --model_type unet --data_path ./data/oxford-iiit-pet --save_visualizations
Evaluate a trained ResNet34-UNet model:
python src/evaluate.py --model_path ./saved_models/resnet34_unet_best_model.pth --model_type resnet34_unet --data_path ./data/oxford-iiit-pet --save_visualizations
Run inference with a trained U-Net model:
python src/inference.py --model ./saved_models/unet_best_model.pth --model_type unet --data_path ./data/oxford-iiit-pet --save_results
Run inference with a trained ResNet34-UNet model:
python src/inference.py --model ./saved_models/resnet34_unet_best_model.pth --model_type resnet34_unet --data_path ./data/oxford-iiit-pet --save_results
Run inference on a single image:
python src/inference_demo.py --model_path ./saved_models/unet_best_model.pth --model_type unet --image_path ./demo/sample.jpg --output_path ./results/demo_result.png
Run the complete demo script:
cd demo && chmod +x demo.sh && ./demo.sh
binary-semantic-segmentation/
├── src/
│ ├── models/
│ │ ├── __init__.py
│ │ ├── unet.py # U-Net model implementation
│ │ └── resnet34_unet.py # ResNet34-UNet model implementation
│ ├── oxford_pet.py # Oxford-IIIT Pet dataset loading and preprocessing
│ ├── train.py # Training script
│ ├── evaluate.py # Evaluation script
│ ├── inference.py # Inference script
│ └── utils.py # Utility functions
├── demo/ # Demo scripts and examples
├── data/ # Dataset directory (created automatically)
├── requirements.txt # Python dependencies
├── TECHNICAL_REPORT.md # Detailed technical implementation report
└── README.md # This file
src/models/unet.py
: U-Net architecture implementation with encoder-decoder structuresrc/models/resnet34_unet.py
: ResNet34-UNet hybrid architecture implementationsrc/oxford_pet.py
: Oxford-IIIT Pet dataset class with automatic download and preprocessingsrc/train.py
: Complete training pipeline with validation and checkpointingsrc/evaluate.py
: Model evaluation with metrics calculation and visualizationsrc/utils.py
: Helper functions for device selection, logging, and visualizationTECHNICAL_REPORT.md
: Comprehensive technical report with implementation details and experimental results
The training process includes:
- Data Loading: Automatic dataset download and train/validation split (90%/10%)
- Preprocessing: Image resizing to 256×256 and trimap conversion to binary masks
- Training Loop: Forward pass, loss calculation, backpropagation with gradient clipping
- Validation: Periodic evaluation on validation set with IoU and accuracy metrics
- Checkpointing: Automatic saving of best models and periodic checkpoints
- Learning Rate Scheduling: Adaptive learning rate reduction based on validation performance
- Training Accuracy: ~95%+ on training set
- Validation Accuracy: ~90%+ on validation set
- IoU Score: ~0.8+ for well-segmented images
- Convergence: Typically converges within 30-50 epochs
- Inference Speed: ~50-100ms per image on GPU
- Model Size: ~31M parameters (~120MB file size)
- Memory Usage: ~2-4GB GPU memory during training
For detailed experimental results, training insights, and comprehensive technical analysis, see TECHNICAL_REPORT.md.
The Oxford-IIIT Pet Dataset contains:
- 37 pet categories (cats and dogs)
- ~7,400 images total
- Trimap annotations with pixel-level labels:
- Class 1: Foreground (pet)
- Class 2: Background
- Class 3: Boundary/uncertain regions
The dataset is automatically downloaded and processed into binary masks suitable for semantic segmentation.
- U-Net Paper: U-Net: Convolutional Networks for Biomedical Image Segmentation
- Oxford-IIIT Pet Dataset: Cats and Dogs Dataset
- PyTorch: Deep Learning Framework
This project is open source and available under the MIT License.