Semantic CNN Navigation implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.
- Dataset Download: https://doi.org/10.5281/zenodo.18350696
- SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
- S³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
- Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav
This repository contains two main components:
- Training: CNN-based control policy training using the Semantic2D dataset
- ROS Deployment: Real-time semantic-aware navigation for mobile robots
The Semantic CNN Navigation system combines:
- S³-Net: Real-time semantic segmentation of 2D LiDAR scans
- SemanticCNN: ResNet-based control policy that uses semantic information for navigation
Engineering Lobby Semantic Navigation

Engineering 4th Floor Semantic Navigation

CYC 4th Floor Semantic Navigation

┌─────────────────────────────────────────────────────────────────────┐
│ Semantic CNN Navigation │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────────────┐ │
│ │ LiDAR Scan │───▶│ S³-Net │───▶│ Semantic Labels (10) │ │
│ │ + Intensity│ │ Segmentation│ │ per LiDAR point │ │
│ └─────────────┘ └─────────────┘ └───────────┬─────────────┘ │
│ │ │
│ ┌─────────────┐ ▼ │
│ │ Sub-Goal │───────────────────────▶┌─────────────────────────┐ │
│ │ (x, y) │ │ SemanticCNN │ │
│ └─────────────┘ │ (ResNet + Bottleneck) │ │
│ │ │ │
│ ┌─────────────┐ │ Input: 80x80 scan map │ │
│ │ Scan Map │───────────────────────▶│ + semantic map │ │
│ │ (history) │ │ + sub-goal │ │
│ └─────────────┘ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Velocity Command │ │
│ │ (linear_x, angular_z) │ │
│ └─────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
- Python 3.7+
- PyTorch 1.7.1+
- TensorBoard
- NumPy
- tqdm
- Ubuntu 20.04
- ROS Noetic
- Python 3.8.5
- PyTorch 1.7.1
Install training dependencies:
pip install torch torchvision tensorboardX numpy tqdmThe training expects the Semantic2D dataset organized as follows:
~/semantic2d_data/
├── dataset.txt # List of dataset folders
├── 2024-04-11-15-24-29/ # Dataset folder 1
│ ├── train.txt # Training sample list
│ ├── dev.txt # Validation sample list
│ ├── scans_lidar/ # Range scans (.npy)
│ ├── semantic_label/ # Semantic labels (.npy)
│ ├── sub_goals_local/ # Local sub-goals (.npy)
│ └── velocities/ # Ground truth velocities (.npy)
└── ...
SemanticCNN uses a ResNet-style architecture with Bottleneck blocks:
| Component | Details |
|---|---|
| Input | 2 channels: scan map (80x80) + semantic map (80x80) |
| Backbone | ResNet with Bottleneck blocks [2, 1, 1] |
| Goal Input | 2D sub-goal (x, y) concatenated after pooling |
| Output | 2D velocity (linear_x, angular_z) |
| Loss | MSE Loss |
Key Parameters:
- Sequence length: 10 frames
- Image size: 80x80
- LiDAR points: 1081 → downsampled to 720 (removing ±180 points)
Train the Semantic CNN model:
cd training
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/Arguments:
$1- Training data directory$2- Validation data directory
Training Configuration (in scripts/train.py):
| Parameter | Default | Description |
|---|---|---|
NUM_EPOCHS |
4000 | Total training epochs |
BATCH_SIZE |
64 | Samples per batch |
LEARNING_RATE |
0.001 | Initial learning rate |
Learning Rate Schedule:
- Epochs 0-40:
1e-3 - Epochs 40-2000:
2e-4 - Epochs 2000-21000:
2e-5 - Epochs 21000+:
1e-5
Model checkpoints saved every 50 epochs to ./model/.
Evaluate the trained model:
cd training
sh run_eval.sh ~/semantic2d_data/Output: Results saved to ./output/
training/
├── model/
│ └── semantic_cnn_model.pth # Pretrained model weights
├── scripts/
│ ├── model.py # SemanticCNN architecture + NavDataset
│ ├── train.py # Training script
│ └── decode_demo.py # Evaluation/demo script
├── run_train.sh # Training driver script
└── run_eval.sh # Evaluation driver script
Training logs are saved to ./runs/. View training progress:
cd training
tensorboard --logdir=runsMonitored metrics:
- Training loss
- Validation loss
Install the following ROS packages:
# Create catkin workspace
mkdir -p ~/catkin_ws/src
cd ~/catkin_ws/src
# Clone required packages
git clone https://github.com/TempleRAIL/robot_gazebo.git
git clone https://github.com/TempleRAIL/pedsim_ros_with_gazebo.git
# Build
cd ~/catkin_ws
catkin_make
source devel/setup.bash- Copy the ROS workspace to your catkin workspace:
cp -r ros_deployment_ws/src/semantic_cnn_nav ~/catkin_ws/src/- Build the workspace:
cd ~/catkin_ws
catkin_make
source devel/setup.bashroslaunch semantic_cnn_nav semantic_cnn_nav_gazebo.launchThis launch file starts:
- Gazebo simulator with pedestrians (pedsim)
- AMCL localization
- CNN data publisher
- Semantic CNN inference node
- RViz visualization
Key parameters in semantic_cnn_nav_gazebo.launch:
| Parameter | Default | Description |
|---|---|---|
s3_net_model_file |
model/s3_net_model.pth |
S³-Net model path |
semantic_cnn_model_file |
model/semantic_cnn_model.pth |
SemanticCNN model path |
scene_file |
eng_hall_5.xml |
Pedsim scenario file |
world_name |
eng_hall.world |
Gazebo world file |
map_file |
gazebo_eng_lobby.yaml |
Navigation map |
initial_pose_x/y/a |
1.0, 0.0, 0.13 | Robot initial pose |
Use RViz "2D Nav Goal" tool to send navigation goals to the robot.
Publishes processed LiDAR data for the CNN.
Subscriptions:
/scan(sensor_msgs/LaserScan)
Publications:
/cnn_data(cnn_msgs/CNN_data)
Main inference node combining S³-Net and SemanticCNN.
Subscriptions:
/cnn_data(cnn_msgs/CNN_data)
Publications:
/navigation_velocity_smoother/raw_cmd_vel(geometry_msgs/Twist)
Parameters:
~s3_net_model_file: Path to S³-Net model~semantic_cnn_model_file: Path to SemanticCNN model
ros_deployment_ws/
└── src/
└── semantic_cnn_nav/
├── cnn_msgs/
│ └── msg/
│ └── CNN_data.msg # Custom message definition
└── semantic_cnn/
├── launch/
│ ├── cnn_data_pub.launch
│ ├── semantic_cnn_inference.launch
│ └── semantic_cnn_nav_gazebo.launch
└── src/
├── model/
│ ├── s3_net_model.pth # S³-Net pretrained weights
│ └── semantic_cnn_model.pth # SemanticCNN weights
├── cnn_data_pub.py # Data preprocessing node
├── cnn_model.py # Model definitions
├── pure_pursuit.py # Pure pursuit controller
├── goal_visualize.py # Goal visualization
└── semantic_cnn_nav_inference.py # Main inference node
Pre-trained models are included:
| Model | Location | Description |
|---|---|---|
s3_net_model.pth |
ros_deployment_ws/.../model/ |
S³-Net semantic segmentation |
semantic_cnn_model.pth |
training/model/ |
SemanticCNN navigation policy |
@article{xie2026semantic2d,
title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
journal={arXiv preprint arXiv:2409.09899},
year={2026}
}
@inproceedings{xie2021towards,
title={Towards Safe Navigation Through Crowded Dynamic Environments},
author={Xie, Zhanteng and Xin, Pujie and Dames, Philip},
booktitle={2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2021},
doi={10.1109/IROS51168.2021.9636102}
}