Skip to content

Mask splitter network for anterior-posterior vehicle segmentation. Part of the NSER-IBVS quadrotor control system (ICCV 2025).

License

Notifications You must be signed in to change notification settings

SpaceTime-Vision-Robotics-Laboratory/mask-splitter

Repository files navigation

Mask Splitter - Annotation tool and Neural Network

Python Version License: AFL-3.0

Paper arXiv Hugging Face Demo BibTeX

Hugging Face Data Data Hugging Face Pre-trained models

A lightweight data labeling and neural network training tool for splitting object segmentation masks into front/back (anterior/posterior) regions. Developed as part of the NSER-IBVS (Numerically Stable Efficient Reduced Image-Based Visual Servoing) system for quadrotor visual control.

Mask Splitter Annotation Tool Demo

Overview

This tool provides:

  1. Interactive Labeling Tool: Split YOLO-generated segmentation masks into front and back regions with a single click
  2. U-Net Based Mask Splitter Network: Train a neural network (~1.94M parameters) to automatically predict front/back segmentation from RGB images and vehicle masks
  3. Inference Pipeline: Deploy trained models for real-time mask splitting

Mask Splitter Inference (YOLO Mask left and Predicted Front & Back right) Demo

The mask splitter addresses the problem of unstable keypoint ordering in visual servoing by determining the orientation of detected objects, enabling consistent feature point correspondence across frames.

Try it online on Hugging Face spaces brittleru/mask-splitter-tool.

Architecture

Mask Splitter Architecture

The network uses a U-Net style encoder-decoder architecture:

Component Description
Input 4-channel tensor (RGB + binary mask) at 360x640 resolution
Encoder Progressive downsampling through 3 stages (32 → 64 → 128 → 256 channels)
Attention Spatial attention mechanism on the mask channel for feature modulation
Decoder Transposed convolutions with skip connections from encoder
Output 2-channel output (front mask, back mask)
Regularization Dropout in deeper layers, BatchNorm throughout
Parameters ~1.94M trainable parameters

Model Specifications

Specification Value
Input Resolution 360 x 640 pixels
Input Channels 4 (RGB + binary mask)
Output Channels 2 (front mask, back mask)
Base Channels 32
Encoder Stages 3 (32 → 64 → 128 → 256)
Total Parameters ~1.94M
Recommended GPU Memory ≥4GB
Inference Time (GPU) ~5-10ms per frame

Loss Function

The training uses a specialized PartitionLoss with three components:

L_total = α * L_individual + β * L_partition + γ * (L_overlap + L_coverage)
Component Description
L_individual BCE loss for front and back mask supervision
L_partition MSE ensuring front + back = original mask
L_overlap Penalty for overlapping predictions
L_coverage Penalty for missing or excess coverage

Loss weights are scheduled during training: starting at (α=0.9, β=0.05, γ=0.05) and ending at (α=0.4, β=0.4, γ=0.2).

Installation

Requirements: Python 3.10+, CUDA-capable GPU recommended

Option 1: Install as standalone

# Clone the repository
git clone git@github.com:SpaceTime-Vision-Robotics-Laboratory/mask-splitter.git
cd mask-splitter

# Create virtual environment
python3 -m venv ./venv
source ./venv/bin/activate

# Install dependencies
python -m pip install -r requirements.txt

# Install package in development mode
python -m pip install -e .

Option 2: Install as git submodule

git submodule add git@github.com:SpaceTime-Vision-Robotics-Laboratory/mask-splitter.git external/mask_splitter
git submodule update --init --recursive

python -m pip install -e ./external/mask_splitter

Dependencies

Core dependencies (See requirements.txt).

Exact dependencies versions (See requirements-dev.txt).

Run Tests

Run tests which verify imports and functionality:

python -m unittest discover ./tests

Quick Start

# 1. Run the annotation tool demo
python -m runnable/run_mask_splitter_tool.py

# 2. Create a dataset (generate YOLO segmentations + annotate). Follow script docstring for more information.
python -m runnable/create_dataset.py --dataset_path=/path/to/your/data

# 3. Train the model. Follow script docstring for more information.
python -m runnable/train_splitter_network.py --data_dir=/path/to/your/data

# 4. Run inference
python -m runnable/run_inference_video.py \
    --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt

Dataset Structure

The dataset must follow this directory structure:

data/
├── train/
│   ├── images/
│   │   ├── scene-name-01/
│   │   │   ├── frame_000001.png
│   │   │   ├── frame_000002.png
│   │   │   └── ...
│   │   └── scene-name-02/
│   │       └── ...
│   ├── segmented/
│   │   ├── scene-name-01/
│   │   │   ├── frame_000001.png  # Binary masks from YOLO
│   │   │   └── ...
│   │   └── scene-name-02/
│   │       └── ...
│   └── labels/
│       ├── scene-name-01/
│       │   ├── front/
│       │   │   ├── frame_000001.png  # Front mask annotations
│       │   │   └── ...
│       │   └── back/
│       │       ├── frame_000001.png  # Back mask annotations
│       │       └── ...
│       └── scene-name-02/
│           └── ...
└── validation/
    └── ... (same structure as train)

Important: Frame filenames must match across images/, segmented/, and labels/ directories.

Usage

1. Data Annotation

Generate Segmentations (using YOLO) and label data with interactive tool

python -m runnable/create_dataset.py --dataset_path="PATH-TO-DATASET"

This command:

  1. Runs YOLO segmentation on all images to generate binary masks
  2. Opens the interactive annotation tool for each frame

Annotation Tool Instructions

  1. The tool displays each car segmentation mask
  2. Click on the front portion of the vehicle
  3. The algorithm calculates the centroid and creates a geometric split:
    • Pixels in the clicked direction → front mask
    • Pixels in the opposite direction → back mask
  4. Press K or Enter to confirm
  5. Press R to redo the current frame
  6. Press Q or ESC to skip

2. Training

Basic Training for the Mask Splitter Network

python -m runnable/train_splitter_network.py --data_dir=/path/to/data

Note: Need to have a valid dataset, must annotate first or download ours.

Full Training Options

python -m runnable/train_splitter_network.py \
    --data_dir=/path/to/data \
    --epochs=10 \
    --batch_size=8 \
    --lr=1e-4 \
    --dropout=0.0 \
    --save_dir=./checkpoints/ \
    --hq_multi=5 \
    --lq_multi=2 \
    --allowed scene-1 scene-2 scene-3 \
    --allowed_val val-scene-1 val-scene-2 \
    --scene_multi scene-1=10 scene-2=5

Training Arguments

Argument Default Description
--data_dir Required Path to dataset root (must contain train/ and validation/ subdirs)
--epochs 10 Number of training epochs
--batch_size 8 Batch size for training
--lr 1e-4 Learning rate
--dropout 0.0 Dropout rate for regularization
--save_dir ./checkpoints/ Directory to save model checkpoints
--hq_multi 5 Multiplier for high-quality scenes (data augmentation)
--lq_multi 2 Multiplier for low-quality scenes
--allowed See code List of allowed training scene names
--allowed_val See code List of allowed validation scene names
--scene_multi - Per-scene multipliers as scene=N pairs

Training Metrics

The training loop reports:

  • Loss: Total loss, individual BCE, partition constraint, overlap penalty
  • Accuracy: Per-class (front/back) and average pixel accuracy
  • IoU: Intersection over Union for front and back masks
  • F1 Score: Harmonic mean of precision and recall
  • Partition Quality: Percentage of perfect front+back=original predictions

3. Inference

Run inference on a dataset scene and generate a visualization video:

python -m runnable/run_inference_video.py \
    --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt \
    --output=./inference_video.mp4  \
    --fps=15

Available arguments:

Argument Required Default Description
--data_dir Yes - Path to dataset directory
--scene Yes - Scene name to process
--model_path Yes - Path to trained model (.pt file)
--output No None Output video path (displays if not provided)
--fps No 10 Frames per second
--no_text No False Disable text overlay
--reencode No False Re-encode with ffmpeg for compatibility

Examples:

# Display video only (no file output)
python -m runnable/run_inference_video.py --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./checkpoints/mask_splitter.pt --fps=15

# Save and re-encode for better compatibility
python -m runnable/run_inference_video.py --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./checkpoints/mask_splitter.pt \
    --output=./inference_video.mp4 --reencode --fps=15

Programmatic API

Inference in Your Code

import cv2
from mask_splitter.nn.infer import MaskSplitterInference

# Initialize the model
splitter = MaskSplitterInference(
    model_path="./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt",
    device="cuda",  # or "cpu"
    image_size=(360, 640),
    confidence_threshold=0.5
)

# Load image and mask
image = cv2.imread("frame.png")
mask = cv2.imread("mask.png", cv2.IMREAD_GRAYSCALE)

# Run inference
front_mask, back_mask = splitter.infer(image, mask)

# Visualize results
splitter.visualize(image, front_mask, back_mask)

Using YOLO Segmentation

from mask_splitter.yolo_model import YoloSegmentation

# Initialize YOLO model
yolo = YoloSegmentation(
    model_path="./models/yolo-car-full-segmentation.pt",
    confidence_threshold=0.7
)

# Segment an image
annotated_frame, binary_mask = yolo.segment_image(frame)

# Get detection info
results = yolo.detect(frame)
target = yolo.find_best_target_box(results)
print(f"Confidence: {target.confidence}, Center: {target.center}")

Manual Mask Splitting (Geometric)

from mask_splitter.car_mask_splitter import CarMaskSplitter
import cv2

splitter = CarMaskSplitter()

image = cv2.imread("frame.png")
mask = cv2.imread("mask.png", cv2.IMREAD_GRAYSCALE)

# Interactive annotation
front_mask, back_mask = splitter.annotate(image, mask, frame_name="frame_001")

# Or programmatic splitting (given a front point)
front_mask, back_mask = splitter.geometric_split_mask(mask, front_point=(320, 180))

Custom Dataset Loading

from mask_splitter.nn.dataset_car_segmentation import CarSegmentationDataset, AdvancedTransform
from torch.utils.data import DataLoader

# Create dataset with augmentation
dataset = CarSegmentationDataset(
    root_dir="./data/train",
    image_size=(360, 640),
    transform=AdvancedTransform(
        flip_prob=0.5,
        rotate_deg=15,
        brightness=0.15,
        saturation=0.25
    ),
    allowed_scenes=["scene-1", "scene-2"],
    scene_multipliers={"scene-1": 5, "scene-2": 2}
)

loader = DataLoader(dataset, batch_size=8, shuffle=True)

for inputs, targets in loader:
    # inputs: (B, 4, H, W) - RGB + mask
    # targets: (B, 2, H, W) - front + back masks
    pass

Dataset Downloads

The dataset is available at the following link or on demand if link not available anymore:

Dataset Link Description
Hugging Face Hugging Face Dataset Full dataset on both Sim and Real
Simulator Data Google Drive Train and validation for Parrot Sphinx
Real-world Data Google Drive Train and validation for laboratory

Models

Pre-trained models are available in the models/ directory:

Final trianed models are also available on hugging face brittleru/nser-ibvs-drone.

Citation:

If you use this tool in your research, please cite:

@InProceedings{Mocanu_2025_ICCV,
    author    = {Mocanu, Sebastian and Nae, Sebastian-Ion and Barbu, Mihai-Eugen and Leordeanu, Marius},
    title     = {Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2025},
    pages     = {1744-1753}
}

Acknowledgments

This work was developed at the SpaceTime Vision & Robotics Laboratory as part of the NSER-IBVS project for autonomous quadrotor visual servoing presented at ICCV 2025.

About

Mask splitter network for anterior-posterior vehicle segmentation. Part of the NSER-IBVS quadrotor control system (ICCV 2025).

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages