Skip to content
/ ALOcc Public

[ICCV 2025] ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

License

Notifications You must be signed in to change notification settings

cdb342/ALOcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

26 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

arXiv CVPR 2024 License Python PyTorch

ALOcc is a state-of-the-art, vision-only framework for dense 3D scene understanding. It transforms multi-camera 2D images into rich, spatiotemporal 3D representations, jointly predicting semantic occupancy grids and per-voxel motion flow. Our purely convolutional design achieves top-tier performance while offering a spectrum of models that balance accuracy and real-time efficiency, making it ideal for autonomous systems.



๐Ÿš€ Get Started

1. Installation

We recommend managing the environment with Conda.

# Clone this repository
git clone https://github.com/cdb342/ALOcc.git
cd ALOcc

# Create and activate the conda environment
conda create -n alocc python=3.8 -y
conda activate alocc

# Install PyTorch (example for CUDA 11.8, adjust if needed)
pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

# Install MMCV (requires building C++ ops)
# Note: Using the stable 1.x branch for compatibility
git clone https://github.com/open-mmlab/mmcv.git
cd mmcv
git checkout 1.x
MMCV_WITH_OPS=1 pip install -e . -v
cd ..

# Install MMDetection and MMSegmentation
pip install mmdet==2.28.2 mmsegmentation==0.30.0

# Install the ALOcc framework in editable mode
pip install -v -e .

# Install remaining dependencies
pip install torchmetrics timm dcnv4 ninja spconv transformers IPython einops
pip install numpy==1.23.4 # Pin numpy version to avoid potential issues

2. Data Preparation

nuScenes Dataset

  1. Download the full nuScenes dataset from the official website.
  2. Download the primary Occ3D-nuScenes annotations from the project page.
  3. (Optional) For extended experiments, download other community annotations:

Please organize your data following this directory structure:

ALOcc/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ nuscenes/
โ”‚   โ”‚   โ”œโ”€โ”€ maps/
โ”‚   โ”‚   โ”œโ”€โ”€ samples/
โ”‚   โ”‚   โ”œโ”€โ”€ sweeps/
โ”‚   โ”‚   โ”œโ”€โ”€ v1.0-test/
โ”‚   โ”‚   โ”œโ”€โ”€ v1.0-trainval/
โ”‚   โ”‚   โ”œโ”€โ”€ gts/                 # Main Occ3D annotations
โ”‚   โ”‚   โ”œโ”€โ”€ gts_surroundocc/     # (Optional) SurroundOcc annotations
โ”‚   โ”‚   โ”œโ”€โ”€ openocc_v2/          # (Optional) OpenOcc annotations
โ”‚   โ”‚   โ”œโ”€โ”€ openocc_v2_ray_mask/ # (Optional) OpenOcc ray mask
โ”‚   โ”‚   โ””โ”€โ”€ nuScenes-Occupancy-v0.1/ # (Optional) OpenOccupancy annotations
...

Finally, run the preprocessing scripts to prepare the data for training:

# 1. Extract semantic segmentation labels from LiDAR
python tools/nusc_process/extract_sem_point.py

# 2. Create formatted info files for the dataloader
PYTHONPATH=$(pwd):$PYTHONPATH python tools/create_data_bevdet.py

Alternatively, you can download the pre-processed segmentation labels, train.pkl and val.pkl files from our Hugging Face Hub, and organize their path as:

ALOcc/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ lidar_seg
โ”‚   โ”œโ”€โ”€ nuscenes/
โ”‚   โ”‚   โ”œโ”€โ”€ train.pkl
โ”‚   โ”‚   โ”œโ”€โ”€ val.pkl
โ”‚   โ”‚   ...
...

3. Pre-trained Models

For training, please download pre-trained image backbones from BEVDet, GeoMIM, or our Hugging Face Hub. Place the checkpoint files in the ckpts/pretrain/ directory.


๐ŸŽฎ Train & Evaluate

Training

Use the provided script for distributed training on multiple GPUs.

# Syntax: bash tools/dist_train.sh [CONFIG_FILE] [WORK_DIR] [NUM_GPUS]

# Example: Train the ALOcc-3D model with 8 GPUs
bash tools/dist_train.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py work_dirs/alocc_3d 8

Testing

Download our official pre-trained models from the ALOcc Hugging Face Hub and place them in the ckpts/ directory.

# Evaluate semantic occupancy (mIoU) or occupancy flow
# Syntax: bash tools/dist_test.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]

# Example: Evaluate the pre-trained ALOcc-3D model
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d.pth 8

# Evaluate semantic occupancy (RayIoU metric)
# Syntax: bash tools/dist_test_ray.sh [CONFIG_FILE] [CHECKPOINT_PATH] [NUM_GPUS]

# Example: Evaluate ALOcc-3D with the RayIoU script
bash tools/dist_test_ray.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain_wo_mask.py ckpts/alocc_3d_wo_mask.pth 8

โš ๏ธ Important Note: When running inference with temporal fusion enabled, please use exactly 1 or 8 GPUs. Using a different number of GPUs may lead to incorrect results due to a sampler bug causing duplicate sample processing.

Benchmarking

We provide convenient tools to benchmark model latency (FPS) and computational cost (FLOPs).

# Benchmark FPS (Frames Per Second)
# Syntax: python tools/analysis_tools/benchmark.py [CONFIG_FILE]
python tools/analysis_tools/benchmark.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py

# Calculate FLOPs
# Syntax: python tools/analysis_tools/get_flops.py [CONFIG_FILE] --shape [HEIGHT] [WIDTH]
python tools/analysis_tools/get_flops.py configs/alocc/alocc_3d_256x704_bevdet_preatrain.py --shape 256 704

Visualization

First, ensure you have Mayavi installed. You can install it using pip:

pip install mayavi

Before you can visualize the output, you need to run the model on the test set and save the prediction results.

Use the dist_test.sh script with the --save flag. This will store the model's output in a directory.

# Example: Evaluate the ALOcc-3D model and save the predictions
bash tools/dist_test.sh configs/alocc/alocc_3d_256x704_bevdet_preatrain.py ckpts/alocc_3d_256x704_bevdet_preatrain.pth 8 --save

The prediction results will be saved in the test/ directory, following a path structure like: test/[CONFIG_NAME]/[TIMESTAMP]/.

Once the predictions are saved, you can run the visualization script. This script requires the path to the prediction results and the path to the ground truth data.

# Syntax: python tools/visual.py [PREDICTION_PATH] [GROUND_TRUTH_PATH]
# Example:
python tools/visual.py work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ your/path/to/ground_truth
  • Replace work_dirs/alocc_3d_256x704_bevdet_preatrain/xxxxxxxx_xxxxxx/ with the actual path to your saved prediction results from Step 2.
  • Replace your/path/to/ground_truth with the path to the corresponding ground truth dataset.

This will launch an interactive Mayavi window where you can inspect and compare the 3D occupancy predictions.

๐Ÿ“Š Results & Model Zoo

๐Ÿ† Performance on Occ3D-nuScenes (trained with camera visible mask)
Model Backbone Input Size mIoUDm mIoUm FPS Config Weights
ALOcc-2D-mini R-50 256 ร— 704 35.4 41.4 30.5 config HF Hub
ALOcc-2D R-50 256 ร— 704 38.7 44.8 8.2 config HF Hub
ALOcc-3D R-50 256 ร— 704 39.3 45.5 6.0 config HF Hub
๐Ÿ† Performance on Occ3D-nuScenes (trained w/o camera visible mask)
Model Backbone Input Size mIoU RayIoU RayIoU1m, 2m, 4m FPS Config Weights
ALOcc-2D-mini R-50 256 ร— 704 33.4 39.3 32.9, 40.1, 44.8 30.5 config HF Hub
ALOcc-2D R-50 256 ร— 704 37.4 43.0 37.1, 43.8, 48.2 8.2 config HF Hub
ALOcc-3D R-50 256 ร— 704 38.0 43.7 37.8, 44.7, 48.8 6.0 config HF Hub
๐Ÿ† Performance on OpenOcc (Semantic Occupancy and Flow)
Method Backbone Input Size Occ Score mAVE mAVETP RayIoU RayIoU1m, 2m, 4m FPS Config Weights
ALOcc-Flow-2D R-50 256 ร— 704 41.9 0.530 0.431 40.3 34.3, 41.0, 45.5 7.0 config HF Hub
ALOcc-Flow-3D R-50 256 ร— 704 43.1 0.549 0.458 41.9 35.6, 42.9, 47.2 5.5 config HF Hub

For more detailed results and ablations, please refer to our paper.


๐Ÿ™ Acknowledgement

This project is built upon the excellent foundation of several open-source projects. We extend our sincere gratitude to their authors and contributors.


๐Ÿ“œ Citation

If you find ALOcc useful for your research or applications, please consider citing our paper:

@InProceedings{chen2025alocc,
    author    = {Chen, Dubing and Fang, Jin and Han, Wencheng and Cheng, Xinjing and Yin, Junbo and Xu, Chenzhong and Khan, Fahad Shahbaz and Shen, Jianbing},
    title     = {Alocc: adaptive lifting-based 3d semantic occupancy and cost volume-based flow prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2025},
}

@article{chen2024adaocc,
  title={AdaOcc: Adaptive Forward View Transformation and Flow Modeling for 3D Occupancy and Flow Prediction},
  author={Chen, Dubing and Han, Wencheng and Fang, Jin and Shen, Jianbing},
  journal={arXiv preprint arXiv:2407.01436},
  year={2024}
}

๐Ÿ”ผ Back to Top

About

[ICCV 2025] ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages