Mask Splitter - Annotation tool and Neural Network

A lightweight data labeling and neural network training tool for splitting object segmentation masks into front/back (anterior/posterior) regions. Developed as part of the NSER-IBVS (Numerically Stable Efficient Reduced Image-Based Visual Servoing) system for quadrotor visual control.

Overview

This tool provides:

Interactive Labeling Tool: Split YOLO-generated segmentation masks into front and back regions with a single click
U-Net Based Mask Splitter Network: Train a neural network (~1.94M parameters) to automatically predict front/back segmentation from RGB images and vehicle masks
Inference Pipeline: Deploy trained models for real-time mask splitting

The mask splitter addresses the problem of unstable keypoint ordering in visual servoing by determining the orientation of detected objects, enabling consistent feature point correspondence across frames.

Try it online on Hugging Face spaces brittleru/mask-splitter-tool.

Architecture

The network uses a U-Net style encoder-decoder architecture:

Component	Description
Input	4-channel tensor (RGB + binary mask) at 360x640 resolution
Encoder	Progressive downsampling through 3 stages (32 → 64 → 128 → 256 channels)
Attention	Spatial attention mechanism on the mask channel for feature modulation
Decoder	Transposed convolutions with skip connections from encoder
Output	2-channel output (front mask, back mask)
Regularization	Dropout in deeper layers, BatchNorm throughout
Parameters	~1.94M trainable parameters

Model Specifications

Specification	Value
Input Resolution	360 x 640 pixels
Input Channels	4 (RGB + binary mask)
Output Channels	2 (front mask, back mask)
Base Channels	32
Encoder Stages	3 (32 → 64 → 128 → 256)
Total Parameters	~1.94M
Recommended GPU Memory	≥4GB
Inference Time (GPU)	~5-10ms per frame

Loss Function

The training uses a specialized PartitionLoss with three components:

L_total = α * L_individual + β * L_partition + γ * (L_overlap + L_coverage)

Component	Description
`L_individual`	BCE loss for front and back mask supervision
`L_partition`	MSE ensuring front + back = original mask
`L_overlap`	Penalty for overlapping predictions
`L_coverage`	Penalty for missing or excess coverage

Loss weights are scheduled during training: starting at (α=0.9, β=0.05, γ=0.05) and ending at (α=0.4, β=0.4, γ=0.2).

Installation

Requirements: Python 3.10+, CUDA-capable GPU recommended

Option 1: Install as standalone

# Clone the repository
git clone git@github.com:SpaceTime-Vision-Robotics-Laboratory/mask-splitter.git
cd mask-splitter

# Create virtual environment
python3 -m venv ./venv
source ./venv/bin/activate

# Install dependencies
python -m pip install -r requirements.txt

# Install package in development mode
python -m pip install -e .

Option 2: Install as git submodule

git submodule add git@github.com:SpaceTime-Vision-Robotics-Laboratory/mask-splitter.git external/mask_splitter
git submodule update --init --recursive

python -m pip install -e ./external/mask_splitter

Dependencies

Core dependencies (See requirements.txt).

Exact dependencies versions (See requirements-dev.txt).

Run Tests

Run tests which verify imports and functionality:

python -m unittest discover ./tests

Quick Start

# 1. Run the annotation tool demo
python -m runnable/run_mask_splitter_tool.py

# 2. Create a dataset (generate YOLO segmentations + annotate). Follow script docstring for more information.
python -m runnable/create_dataset.py --dataset_path=/path/to/your/data

# 3. Train the model. Follow script docstring for more information.
python -m runnable/train_splitter_network.py --data_dir=/path/to/your/data

# 4. Run inference
python -m runnable/run_inference_video.py \
    --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt

Dataset Structure

The dataset must follow this directory structure:

data/
├── train/
│   ├── images/
│   │   ├── scene-name-01/
│   │   │   ├── frame_000001.png
│   │   │   ├── frame_000002.png
│   │   │   └── ...
│   │   └── scene-name-02/
│   │       └── ...
│   ├── segmented/
│   │   ├── scene-name-01/
│   │   │   ├── frame_000001.png  # Binary masks from YOLO
│   │   │   └── ...
│   │   └── scene-name-02/
│   │       └── ...
│   └── labels/
│       ├── scene-name-01/
│       │   ├── front/
│       │   │   ├── frame_000001.png  # Front mask annotations
│       │   │   └── ...
│       │   └── back/
│       │       ├── frame_000001.png  # Back mask annotations
│       │       └── ...
│       └── scene-name-02/
│           └── ...
└── validation/
    └── ... (same structure as train)

Important: Frame filenames must match across images/, segmented/, and labels/ directories.

Usage

1. Data Annotation

Generate Segmentations (using YOLO) and label data with interactive tool

python -m runnable/create_dataset.py --dataset_path="PATH-TO-DATASET"

This command:

Runs YOLO segmentation on all images to generate binary masks
Opens the interactive annotation tool for each frame

Annotation Tool Instructions

The tool displays each car segmentation mask
Click on the front portion of the vehicle
The algorithm calculates the centroid and creates a geometric split:
- Pixels in the clicked direction → front mask
- Pixels in the opposite direction → back mask
Press K or Enter to confirm
Press R to redo the current frame
Press Q or ESC to skip

2. Training

Basic Training for the Mask Splitter Network

python -m runnable/train_splitter_network.py --data_dir=/path/to/data

Note: Need to have a valid dataset, must annotate first or download ours.

Full Training Options

python -m runnable/train_splitter_network.py \
    --data_dir=/path/to/data \
    --epochs=10 \
    --batch_size=8 \
    --lr=1e-4 \
    --dropout=0.0 \
    --save_dir=./checkpoints/ \
    --hq_multi=5 \
    --lq_multi=2 \
    --allowed scene-1 scene-2 scene-3 \
    --allowed_val val-scene-1 val-scene-2 \
    --scene_multi scene-1=10 scene-2=5

Training Arguments

Argument	Default	Description
`--data_dir`	Required	Path to dataset root (must contain `train/` and `validation/` subdirs)
`--epochs`	10	Number of training epochs
`--batch_size`	8	Batch size for training
`--lr`	1e-4	Learning rate
`--dropout`	0.0	Dropout rate for regularization
`--save_dir`	`./checkpoints/`	Directory to save model checkpoints
`--hq_multi`	5	Multiplier for high-quality scenes (data augmentation)
`--lq_multi`	2	Multiplier for low-quality scenes
`--allowed`	See code	List of allowed training scene names
`--allowed_val`	See code	List of allowed validation scene names
`--scene_multi`	-	Per-scene multipliers as `scene=N` pairs

Training Metrics

The training loop reports:

Loss: Total loss, individual BCE, partition constraint, overlap penalty
Accuracy: Per-class (front/back) and average pixel accuracy
IoU: Intersection over Union for front and back masks
F1 Score: Harmonic mean of precision and recall
Partition Quality: Percentage of perfect front+back=original predictions

3. Inference

Run inference on a dataset scene and generate a visualization video:

python -m runnable/run_inference_video.py \
    --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt \
    --output=./inference_video.mp4  \
    --fps=15

Available arguments:

Argument	Required	Default	Description
`--data_dir`	Yes	-	Path to dataset directory
`--scene`	Yes	-	Scene name to process
`--model_path`	Yes	-	Path to trained model (.pt file)
`--output`	No	None	Output video path (displays if not provided)
`--fps`	No	10	Frames per second
`--no_text`	No	False	Disable text overlay
`--reencode`	No	False	Re-encode with ffmpeg for compatibility

Examples:

# Display video only (no file output)
python -m runnable/run_inference_video.py --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./checkpoints/mask_splitter.pt --fps=15

# Save and re-encode for better compatibility
python -m runnable/run_inference_video.py --data_dir=./data/validation \
    --scene=around-car-45-high-quality \
    --model_path=./checkpoints/mask_splitter.pt \
    --output=./inference_video.mp4 --reencode --fps=15

Programmatic API

Inference in Your Code

import cv2
from mask_splitter.nn.infer import MaskSplitterInference

# Initialize the model
splitter = MaskSplitterInference(
    model_path="./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt",
    device="cuda",  # or "cpu"
    image_size=(360, 640),
    confidence_threshold=0.5
)

# Load image and mask
image = cv2.imread("frame.png")
mask = cv2.imread("mask.png", cv2.IMREAD_GRAYSCALE)

# Run inference
front_mask, back_mask = splitter.infer(image, mask)

# Visualize results
splitter.visualize(image, front_mask, back_mask)

Using YOLO Segmentation

from mask_splitter.yolo_model import YoloSegmentation

# Initialize YOLO model
yolo = YoloSegmentation(
    model_path="./models/yolo-car-full-segmentation.pt",
    confidence_threshold=0.7
)

# Segment an image
annotated_frame, binary_mask = yolo.segment_image(frame)

# Get detection info
results = yolo.detect(frame)
target = yolo.find_best_target_box(results)
print(f"Confidence: {target.confidence}, Center: {target.center}")

Manual Mask Splitting (Geometric)

from mask_splitter.car_mask_splitter import CarMaskSplitter
import cv2

splitter = CarMaskSplitter()

image = cv2.imread("frame.png")
mask = cv2.imread("mask.png", cv2.IMREAD_GRAYSCALE)

# Interactive annotation
front_mask, back_mask = splitter.annotate(image, mask, frame_name="frame_001")

# Or programmatic splitting (given a front point)
front_mask, back_mask = splitter.geometric_split_mask(mask, front_point=(320, 180))

Custom Dataset Loading

from mask_splitter.nn.dataset_car_segmentation import CarSegmentationDataset, AdvancedTransform
from torch.utils.data import DataLoader

# Create dataset with augmentation
dataset = CarSegmentationDataset(
    root_dir="./data/train",
    image_size=(360, 640),
    transform=AdvancedTransform(
        flip_prob=0.5,
        rotate_deg=15,
        brightness=0.15,
        saturation=0.25
    ),
    allowed_scenes=["scene-1", "scene-2"],
    scene_multipliers={"scene-1": 5, "scene-2": 2}
)

loader = DataLoader(dataset, batch_size=8, shuffle=True)

for inputs, targets in loader:
    # inputs: (B, 4, H, W) - RGB + mask
    # targets: (B, 2, H, W) - front + back masks
    pass

Dataset Downloads

The dataset is available at the following link or on demand if link not available anymore:

Dataset	Link	Description
Hugging Face	Hugging Face Dataset	Full dataset on both Sim and Real
Simulator Data	Google Drive	Train and validation for Parrot Sphinx
Real-world Data	Google Drive	Train and validation for laboratory

Models

Pre-trained models are available in the models/ directory:

yolo-car-full-segmentation.pt - YOLOv11 Nano for our vehicle segmentation
mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt - Trained mask splitter for Parrot Sphinx Simulator
mask_splitter-epoch_10-dropout_0-_x2_real_early_stop.pt - Trained mask splitter for real world laboratory environment

Final trianed models are also available on hugging face brittleru/nser-ibvs-drone.

Citation:

If you use this tool in your research, please cite:

@InProceedings{Mocanu_2025_ICCV,
    author    = {Mocanu, Sebastian and Nae, Sebastian-Ion and Barbu, Mihai-Eugen and Leordeanu, Marius},
    title     = {Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
    month     = {October},
    year      = {2025},
    pages     = {1744-1753}
}

Acknowledgments

This work was developed at the SpaceTime Vision & Robotics Laboratory as part of the NSER-IBVS project for autonomous quadrotor visual servoing presented at ICCV 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
mask_splitter		mask_splitter
models		models
runnable		runnable
tests		tests
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mask Splitter - Annotation tool and Neural Network

Overview

Architecture

Model Specifications

Loss Function

Installation

Option 1: Install as standalone

Option 2: Install as git submodule

Dependencies

Run Tests

Quick Start

Dataset Structure

Usage

1. Data Annotation

Generate Segmentations (using YOLO) and label data with interactive tool

Annotation Tool Instructions

2. Training

Basic Training for the Mask Splitter Network

Full Training Options

Training Arguments

Training Metrics

3. Inference

Programmatic API

Inference in Your Code

Using YOLO Segmentation

Manual Mask Splitting (Geometric)

Custom Dataset Loading

Dataset Downloads

Models

Citation:

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

SpaceTime-Vision-Robotics-Laboratory/mask-splitter

Folders and files

Latest commit

History

Repository files navigation

Mask Splitter - Annotation tool and Neural Network

Overview

Architecture

Model Specifications

Loss Function

Installation

Option 1: Install as standalone

Option 2: Install as git submodule

Dependencies

Run Tests

Quick Start

Dataset Structure

Usage

1. Data Annotation

Generate Segmentations (using YOLO) and label data with interactive tool

Annotation Tool Instructions

2. Training

Basic Training for the Mask Splitter Network

Full Training Options

Training Arguments

Training Metrics

3. Inference

Programmatic API

Inference in Your Code

Using YOLO Segmentation

Manual Mask Splitting (Geometric)

Custom Dataset Loading

Dataset Downloads

Models

Citation:

Acknowledgments

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages