Welcome to the Carvana Image Masking Challenge repository. This project focuses on semantic segmentation of cars as part of the Carvana Image Masking Challenge on Kaggle. The goal is to generate precise masks for cars in images.
- Objective
- Evaluation
- Model
- Data
- Data Augmentation
- Installation
- Getting Started (Training)
- Inference
- Results
- Post Analysis
- Code
- References
- Folder Structure
- License
The Carvana Image Masking Challenge aims to generate highly precise masks for cars in images.
Semantic segmentation is used to identify the boundaries of cars, contributing to applications such as autonomous driving and object detection.
The main evaluation metric is the Dice coefficient (equivalent to the F1-score in binary segmentation):
- Dice = 1 -> perfect overlap between predicted pixels (X) and ground truth (Y)
- Dice = 0 -> no overlap
Our aim is to maximize Dice by improving overlap between prediction and ground truth.
I developed a custom encoder–decoder architecture, inspired by both SegNet and U-Net:
- Encoder: SegNet-style downsampling (Conv2D → BatchNorm → ReLU → MaxPool)
- Decoder: U-Net-style upsampling with skip connections from encoder layers
- Final layer: Sigmoid (binary mask output)
Architecture Summary:
- 7 encoder layers
- 2 center convolutional layers
- 7 decoder layers
- 1 final classification layer
Encoder–Decoder with skip connections (final activation: sigmoid)
Note: Hardware limitations (NVIDIA RTX 3060, 6GB VRAM) influenced design choices.
The dataset is provided by Kaggle:
Carvana Image Masking Challenge Data
Expected folder structure:
data/
├── raw/
│ ├── train/ # input images
│ └── train_masks/ # ground truth masks
└── processed/ # preprocessed data
To improve generalization, I applied minor augmentations:
- Random shifts
- Scaling
- Rotations
These help the model perform better on unseen data.
Clone the repository and set up the environment:
'''bash git clone https://github.com/chaitanyapeshin/segmentation_for_color_change.git cd segmentation_for_color_change '''
Conda '''bash conda env create -f environment.yml conda activate carvana '''
Run the training notebook:
'''bash jupyter notebook notebooks/model.ipynb '''
This will train the model and log progress to TensorBoard (assets/tensorboard/
).
Use the inference script to predict masks for new images:
'''bash python infer.py --input path/to/image.jpg --output outputs/mask.png '''
The model was trained with Adam optimizer and a custom loss = BCE + (1 - Dice).
Validation performance after ~13 epochs:
Metric | Value |
---|---|
Dice | 0.9956 |
IoU (Jaccard) | 0.9912 |
Pixel Accuracy | 0.9971 |
Precision | 0.9965 |
Recall | 0.9948 |
Dice (5/50/95%) | 0.992 / 0.996 / 0.998 |
- The model segments the main body of cars very well.
- Struggles with fine details:
- Dark shadows near wheels
- Cars painted similar to background
- Thin structures (antennas, roof racks)
Despite these challenges, performance is human-level or better on most images.
- Model implementation:
notebooks/model.ipynb
- Preprocessing:
src/data
- SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation (2015) – Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla
- Fully Convolutional Networks for Semantic Segmentation (2016) – Evan Shelhamer, Jonathan Long, Trevor Darrell
- Learning Deconvolution Network for Semantic Segmentation (2015) – Hyeonwoo Noh, Seunghoon Hong, Bohyung Han
.
├── 29bb3ece3180_11.jpg
├── assets/
│ └── tensorboard/
├── data/
│ ├── processed/
│ └── raw/
├── LICENSE
├── notebooks/
│ └── model.ipynb
├── README.md
├── references/
│ ├── 1411.4038.pdf
│ ├── 1505.04366.pdf
│ └── 1511.00561.pdf
├── environment.yml
├── requirements.txt
├── sample_submission.csv
└── src/
└── data/
This project is licensed under the MIT License – see the LICENSE file for details.