A lightweight data labeling and neural network training tool for splitting object segmentation masks into front/back (anterior/posterior) regions. Developed as part of the NSER-IBVS (Numerically Stable Efficient Reduced Image-Based Visual Servoing) system for quadrotor visual control.
This tool provides:
- Interactive Labeling Tool: Split YOLO-generated segmentation masks into front and back regions with a single click
- U-Net Based Mask Splitter Network: Train a neural network (~1.94M parameters) to automatically predict front/back segmentation from RGB images and vehicle masks
- Inference Pipeline: Deploy trained models for real-time mask splitting
The mask splitter addresses the problem of unstable keypoint ordering in visual servoing by determining the orientation of detected objects, enabling consistent feature point correspondence across frames.
Try it online on Hugging Face spaces brittleru/mask-splitter-tool.
The network uses a U-Net style encoder-decoder architecture:
| Component | Description |
|---|---|
| Input | 4-channel tensor (RGB + binary mask) at 360x640 resolution |
| Encoder | Progressive downsampling through 3 stages (32 → 64 → 128 → 256 channels) |
| Attention | Spatial attention mechanism on the mask channel for feature modulation |
| Decoder | Transposed convolutions with skip connections from encoder |
| Output | 2-channel output (front mask, back mask) |
| Regularization | Dropout in deeper layers, BatchNorm throughout |
| Parameters | ~1.94M trainable parameters |
| Specification | Value |
|---|---|
| Input Resolution | 360 x 640 pixels |
| Input Channels | 4 (RGB + binary mask) |
| Output Channels | 2 (front mask, back mask) |
| Base Channels | 32 |
| Encoder Stages | 3 (32 → 64 → 128 → 256) |
| Total Parameters | ~1.94M |
| Recommended GPU Memory | ≥4GB |
| Inference Time (GPU) | ~5-10ms per frame |
The training uses a specialized PartitionLoss with three components:
L_total = α * L_individual + β * L_partition + γ * (L_overlap + L_coverage)
| Component | Description |
|---|---|
L_individual |
BCE loss for front and back mask supervision |
L_partition |
MSE ensuring front + back = original mask |
L_overlap |
Penalty for overlapping predictions |
L_coverage |
Penalty for missing or excess coverage |
Loss weights are scheduled during training: starting at (α=0.9, β=0.05, γ=0.05) and ending at (α=0.4, β=0.4, γ=0.2).
Requirements: Python 3.10+, CUDA-capable GPU recommended
# Clone the repository
git clone git@github.com:SpaceTime-Vision-Robotics-Laboratory/mask-splitter.git
cd mask-splitter
# Create virtual environment
python3 -m venv ./venv
source ./venv/bin/activate
# Install dependencies
python -m pip install -r requirements.txt
# Install package in development mode
python -m pip install -e .git submodule add git@github.com:SpaceTime-Vision-Robotics-Laboratory/mask-splitter.git external/mask_splitter
git submodule update --init --recursive
python -m pip install -e ./external/mask_splitterCore dependencies (See requirements.txt).
Exact dependencies versions (See requirements-dev.txt).
Run tests which verify imports and functionality:
python -m unittest discover ./tests
# 1. Run the annotation tool demo
python -m runnable/run_mask_splitter_tool.py
# 2. Create a dataset (generate YOLO segmentations + annotate). Follow script docstring for more information.
python -m runnable/create_dataset.py --dataset_path=/path/to/your/data
# 3. Train the model. Follow script docstring for more information.
python -m runnable/train_splitter_network.py --data_dir=/path/to/your/data
# 4. Run inference
python -m runnable/run_inference_video.py \
--data_dir=./data/validation \
--scene=around-car-45-high-quality \
--model_path=./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.ptThe dataset must follow this directory structure:
data/
├── train/
│ ├── images/
│ │ ├── scene-name-01/
│ │ │ ├── frame_000001.png
│ │ │ ├── frame_000002.png
│ │ │ └── ...
│ │ └── scene-name-02/
│ │ └── ...
│ ├── segmented/
│ │ ├── scene-name-01/
│ │ │ ├── frame_000001.png # Binary masks from YOLO
│ │ │ └── ...
│ │ └── scene-name-02/
│ │ └── ...
│ └── labels/
│ ├── scene-name-01/
│ │ ├── front/
│ │ │ ├── frame_000001.png # Front mask annotations
│ │ │ └── ...
│ │ └── back/
│ │ ├── frame_000001.png # Back mask annotations
│ │ └── ...
│ └── scene-name-02/
│ └── ...
└── validation/
└── ... (same structure as train)
Important: Frame filenames must match across
images/,segmented/, andlabels/directories.
python -m runnable/create_dataset.py --dataset_path="PATH-TO-DATASET"This command:
- Runs YOLO segmentation on all images to generate binary masks
- Opens the interactive annotation tool for each frame
- The tool displays each car segmentation mask
- Click on the front portion of the vehicle
- The algorithm calculates the centroid and creates a geometric split:
- Pixels in the clicked direction → front mask
- Pixels in the opposite direction → back mask
- Press K or Enter to confirm
- Press R to redo the current frame
- Press Q or ESC to skip
python -m runnable/train_splitter_network.py --data_dir=/path/to/dataNote: Need to have a valid dataset, must annotate first or download ours.
python -m runnable/train_splitter_network.py \
--data_dir=/path/to/data \
--epochs=10 \
--batch_size=8 \
--lr=1e-4 \
--dropout=0.0 \
--save_dir=./checkpoints/ \
--hq_multi=5 \
--lq_multi=2 \
--allowed scene-1 scene-2 scene-3 \
--allowed_val val-scene-1 val-scene-2 \
--scene_multi scene-1=10 scene-2=5| Argument | Default | Description |
|---|---|---|
--data_dir |
Required | Path to dataset root (must contain train/ and validation/ subdirs) |
--epochs |
10 | Number of training epochs |
--batch_size |
8 | Batch size for training |
--lr |
1e-4 | Learning rate |
--dropout |
0.0 | Dropout rate for regularization |
--save_dir |
./checkpoints/ |
Directory to save model checkpoints |
--hq_multi |
5 | Multiplier for high-quality scenes (data augmentation) |
--lq_multi |
2 | Multiplier for low-quality scenes |
--allowed |
See code | List of allowed training scene names |
--allowed_val |
See code | List of allowed validation scene names |
--scene_multi |
- | Per-scene multipliers as scene=N pairs |
The training loop reports:
- Loss: Total loss, individual BCE, partition constraint, overlap penalty
- Accuracy: Per-class (front/back) and average pixel accuracy
- IoU: Intersection over Union for front and back masks
- F1 Score: Harmonic mean of precision and recall
- Partition Quality: Percentage of perfect front+back=original predictions
Run inference on a dataset scene and generate a visualization video:
python -m runnable/run_inference_video.py \
--data_dir=./data/validation \
--scene=around-car-45-high-quality \
--model_path=./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt \
--output=./inference_video.mp4 \
--fps=15Available arguments:
| Argument | Required | Default | Description |
|---|---|---|---|
--data_dir |
Yes | - | Path to dataset directory |
--scene |
Yes | - | Scene name to process |
--model_path |
Yes | - | Path to trained model (.pt file) |
--output |
No | None | Output video path (displays if not provided) |
--fps |
No | 10 | Frames per second |
--no_text |
No | False | Disable text overlay |
--reencode |
No | False | Re-encode with ffmpeg for compatibility |
Examples:
# Display video only (no file output)
python -m runnable/run_inference_video.py --data_dir=./data/validation \
--scene=around-car-45-high-quality \
--model_path=./checkpoints/mask_splitter.pt --fps=15
# Save and re-encode for better compatibility
python -m runnable/run_inference_video.py --data_dir=./data/validation \
--scene=around-car-45-high-quality \
--model_path=./checkpoints/mask_splitter.pt \
--output=./inference_video.mp4 --reencode --fps=15import cv2
from mask_splitter.nn.infer import MaskSplitterInference
# Initialize the model
splitter = MaskSplitterInference(
model_path="./models/mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt",
device="cuda", # or "cpu"
image_size=(360, 640),
confidence_threshold=0.5
)
# Load image and mask
image = cv2.imread("frame.png")
mask = cv2.imread("mask.png", cv2.IMREAD_GRAYSCALE)
# Run inference
front_mask, back_mask = splitter.infer(image, mask)
# Visualize results
splitter.visualize(image, front_mask, back_mask)from mask_splitter.yolo_model import YoloSegmentation
# Initialize YOLO model
yolo = YoloSegmentation(
model_path="./models/yolo-car-full-segmentation.pt",
confidence_threshold=0.7
)
# Segment an image
annotated_frame, binary_mask = yolo.segment_image(frame)
# Get detection info
results = yolo.detect(frame)
target = yolo.find_best_target_box(results)
print(f"Confidence: {target.confidence}, Center: {target.center}")from mask_splitter.car_mask_splitter import CarMaskSplitter
import cv2
splitter = CarMaskSplitter()
image = cv2.imread("frame.png")
mask = cv2.imread("mask.png", cv2.IMREAD_GRAYSCALE)
# Interactive annotation
front_mask, back_mask = splitter.annotate(image, mask, frame_name="frame_001")
# Or programmatic splitting (given a front point)
front_mask, back_mask = splitter.geometric_split_mask(mask, front_point=(320, 180))from mask_splitter.nn.dataset_car_segmentation import CarSegmentationDataset, AdvancedTransform
from torch.utils.data import DataLoader
# Create dataset with augmentation
dataset = CarSegmentationDataset(
root_dir="./data/train",
image_size=(360, 640),
transform=AdvancedTransform(
flip_prob=0.5,
rotate_deg=15,
brightness=0.15,
saturation=0.25
),
allowed_scenes=["scene-1", "scene-2"],
scene_multipliers={"scene-1": 5, "scene-2": 2}
)
loader = DataLoader(dataset, batch_size=8, shuffle=True)
for inputs, targets in loader:
# inputs: (B, 4, H, W) - RGB + mask
# targets: (B, 2, H, W) - front + back masks
passThe dataset is available at the following link or on demand if link not available anymore:
| Dataset | Link | Description |
|---|---|---|
| Hugging Face | Hugging Face Dataset | Full dataset on both Sim and Real |
| Simulator Data | Google Drive | Train and validation for Parrot Sphinx |
| Real-world Data | Google Drive | Train and validation for laboratory |
Pre-trained models are available in the models/ directory:
- yolo-car-full-segmentation.pt - YOLOv11 Nano for our vehicle segmentation
- mask_splitter-partition-v10-dropout_0-augmentations_multi_scenes.pt - Trained mask splitter for Parrot Sphinx Simulator
- mask_splitter-epoch_10-dropout_0-_x2_real_early_stop.pt - Trained mask splitter for real world laboratory environment
Final trianed models are also available on hugging face brittleru/nser-ibvs-drone.
If you use this tool in your research, please cite:
@InProceedings{Mocanu_2025_ICCV,
author = {Mocanu, Sebastian and Nae, Sebastian-Ion and Barbu, Mihai-Eugen and Leordeanu, Marius},
title = {Efficient Self-Supervised Neuro-Analytic Visual Servoing for Real-time Quadrotor Control},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops},
month = {October},
year = {2025},
pages = {1744-1753}
}This work was developed at the SpaceTime Vision & Robotics Laboratory as part of the NSER-IBVS project for autonomous quadrotor visual servoing presented at ICCV 2025.


