ALOcc is a state-of-the-art, vision-only framework for dense 3D scene understanding. It transforms multi-camera 2D images into rich, spatiotemporal 3D representations, jointly predicting semantic occupancy grids and per-voxel motion flow. Our purely convolutional design achieves top-tier performance while offering a spectrum of models that balance accuracy and real-time efficiency, making it ideal for autonomous systems.
We recommend managing the environment with Conda.
# Clone this repository
git clone https://github.com/cdb342/ALOcc.git
cd ALOcc
# Create and activate the conda environment
conda create -n alocc python=3.8 -y
conda activate alocc
# Install PyTorch (example for CUDA 11.8, adjust if needed)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
# Install MMCV (requires building C++ ops)
# Note: Using the stable 1.x branch for compatibility
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout 1.x
MMCV_WITH_OPS=1 pip install -e . -v
cd ..
# Install MMDetection and MMSegmentation
pip install mmdet==2.28.2 mmsegmentation==0.30.0
# Install the ALOcc framework in editable mode
pip install -v -e .
# Install remaining dependencies
pip install torchmetrics timm dcnv4 ninja spconv transformers IPython einops
pip install numpy==1.23.4 # Pin numpy version to avoid potential issues- Download the full nuScenes dataset from the official website.
- Download the primary Occ3D-nuScenes annotations from the project page.
- (Optional) For extended experiments, download other community annotations:
- OpenOcc_v2.1 Annotations & Ray Mask
- SurroundOcc Annotations (unzip and rename folder to
gts_surroundocc) - OpenOccupancy-v0.1 Annotations
Please organize your data following this directory structure:
ALOcc/
โโโ data/
โ โโโ nuscenes/
โ โ โโโ maps/
โ โ โโโ samples/
โ โ โโโ sweeps/
โ โ โโโ v1.0-test/
โ โ โโโ v1.0-trainval/
โ โ โโโ gts/ # Main Occ3D annotations
โ โ โโโ gts_surroundocc/ # (Optional) SurroundOcc annotations
โ โ โโโ openocc_v2/ # (Optional) OpenOcc annotations
โ โ โโโ openocc_v2_ray_mask/ # (Optional) OpenOcc ray mask
โ โ โโโ nuScenes-Occupancy-v0.1/ # (Optional) OpenOccupancy annotations
...
Finally, run the preprocessing scripts to prepare the data for training:
# 1. Extract semantic segmentation labels from LiDAR
python tools/nusc_process/extract_sem_point.py
# 2. Create formatted info files for the dataloader
PYTHONPATH=$(pwd):$PYTHONPATH python tools/create_data_bevdet.pyAlternatively, you can download the pre-processed segmentation labels, train.pkl and val.pkl files from our Hugging Face Hub, and organize their path as:
ALOcc/
โโโ data/
โ โโโ lidar_seg
โ โโโ nuscenes/
โ โ โโโ train.pkl
โ โ โโโ val.pkl
โ โ ...
...
For training, please download pre-trained image backbones from BEVDet, GeoMIM, or our Hugging Face Hub. Place the checkpoint files in the ckpts/pretrain/ directory.
Use the provided script for distributed training on multiple GPUs.
# Syntax: bash tools/dist_train.sh [CONFIG_FILE] [WORK_DIR] [NUM_GPUS]
# Example: Train the ALOcc-3D model with 8 GPUs
bash tools/dist_train.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py work_dirs/alocc_3d 8Download our official pre-trained models from the ALOcc Hugging Face Hub and place them in the ckpts/ directory.
# Evaluate semantic occupancy (mIoU) or occupancy flow
# Syntax: bash tools/dist_test.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]
# Example: Evaluate the pre-trained ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d.pth 8
# Evaluate semantic occupancy (RayIoU metric)
# Syntax: bash tools/dist_test_ray.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]
# Example: Evaluate ALOcc-3D with the RayIoU script
bash tools/dist_test_ray.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain_wo_mask.py ckpts/alocc_3d_wo_mask.pth 8
โ ๏ธ Important Note: When running inference with temporal fusion enabled, please use exactly 1 or 8 GPUs. Using a different number of GPUs may lead to incorrect results due to a sampler bug causing duplicate sample processing.
We provide convenient tools to benchmark model latency (FPS) and computational cost (FLOPs).
# Benchmark FPS (Frames Per Second)
# Syntax: python tools/analysis_tools/benchmark.py [CONFIG_FILE]
python tools/analysis_tools/benchmark.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py
# Calculate FLOPs
# Syntax: python tools/analysis_tools/get_flops.py [CONFIG_FILE] --shape [HEIGHT] [WIDTH]
python tools/analysis_tools/get_flops.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py --shape 256 704First, ensure you have Mayavi installed. You can install it using pip:
pip install mayaviBefore you can visualize the output, you need to run the model on the test set and save the prediction results.
Use the dist_test.sh script with the --save flag. This will store the model's output in a directory.
# Example: Evaluate the ALOcc-3D model and save the predictions
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8 --saveThe prediction results will be saved in the test/ directory, following a path structure like: test/[CONFIG_NAME]/[TIMESTAMP]/.
Once the predictions are saved, you can run the visualization script. This script requires the path to the prediction results and the path to the ground truth data.
# Syntax: python tools/visual.py [PREDICTION_PATH] [GROUND_TRUTH_PATH]
# Example:
python tools/visual.py work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ your/path/to/ground_truth- Replace
work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/with the actual path to your saved prediction results from Step 2. - Replace
your/path/to/ground_truthwith the path to the corresponding ground truth dataset.
This will launch an interactive Mayavi window where you can inspect and compare the 3D occupancy predictions.
๐ Performance on Occ3D-nuScenes (trained with camera visible mask)
| Model | Backbone | Input Size | mIoUDm | mIoUm | FPS | Config | Weights |
|---|---|---|---|---|---|---|---|
| ALOcc-2D-mini | R-50 | 256 ร 704 | 35.4 | 41.4 | 30.5 | config | HF Hub |
| ALOcc-2D | R-50 | 256 ร 704 | 38.7 | 44.8 | 8.2 | config | HF Hub |
| ALOcc-3D | R-50 | 256 ร 704 | 39.3 | 45.5 | 6.0 | config | HF Hub |
๐ Performance on Occ3D-nuScenes (trained w/o camera visible mask)
| Model | Backbone | Input Size | mIoU | RayIoU | RayIoU1m, 2m, 4m | FPS | Config | Weights |
|---|---|---|---|---|---|---|---|---|
| ALOcc-2D-mini | R-50 | 256 ร 704 | 33.4 | 39.3 | 32.9, 40.1, 44.8 | 30.5 | config | HF Hub |
| ALOcc-2D | R-50 | 256 ร 704 | 37.4 | 43.0 | 37.1, 43.8, 48.2 | 8.2 | config | HF Hub |
| ALOcc-3D | R-50 | 256 ร 704 | 38.0 | 43.7 | 37.8, 44.7, 48.8 | 6.0 | config | HF Hub |
๐ Performance on OpenOcc (Semantic Occupancy and Flow)
| Method | Backbone | Input Size | Occ Score | mAVE | mAVETP | RayIoU | RayIoU1m, 2m, 4m | FPS | Config | Weights |
|---|---|---|---|---|---|---|---|---|---|---|
| ALOcc-Flow-2D | R-50 | 256 ร 704 | 41.9 | 0.530 | 0.431 | 40.3 | 34.3, 41.0, 45.5 | 7.0 | config | HF Hub |
| ALOcc-Flow-3D | R-50 | 256 ร 704 | 43.1 | 0.549 | 0.458 | 41.9 | 35.6, 42.9, 47.2 | 5.5 | config | HF Hub |
For more detailed results and ablations, please refer to our paper.
This project is built upon the excellent foundation of several open-source projects. We extend our sincere gratitude to their authors and contributors.
If you find ALOcc useful for your research or applications, please consider citing our paper:
@InProceedings{chen2025alocc,
author = {Chen, Dubing and Fang, Jin and Han, Wencheng and Cheng, Xinjing and Yin, Junbo and Xu, Chenzhong and Khan, Fahad Shahbaz and Shen, Jianbing},
title = {Alocc: adaptive lifting-based 3d semantic occupancy and cost volume-based flow prediction},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
}
@article{chen2024adaocc,
title={AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction},
author={Chen, Dubing and Han, Wencheng and Fang, Jin and Shen, Jianbing},
journal={arXiv preprint arXiv:2407.01436},
year={2024}
}
