GSRender: Weakly-Supervised 3D Occupancy Perception with 3D Gaussian Splatting
Weakly-supervised 3D occupancy perception is crucial for vision-based autonomous driving in outdoor environments. Previous methods based on NeRF often face a challenge in balancing the number of samples used. Too many samples can decrease efficiency, while too few can compromise accuracy, leading to variations in the mean Intersection over Union (mIoU) by 5-10 points. Furthermore, even with surrounding-view image inputs, only a single image is rendered from each viewpoint at any given moment. This limitation leads to duplicated predictions, which significantly impacts the practicality of the approach.
To address these challenges, we propose GSRender, which uses 3D Gaussian Splatting for weakly-supervised occupancy estimation, simplifying the sampling process. Additionally, we introduce the Ray Compensation module, which reduces duplicated predictions by compensating for features from adjacent frames. Finally, we redesign the dynamic loss to remove the influence of dynamic objects from adjacent frames.
- 3D Gaussian Splatting: Simplifies the sampling process and improves efficiency
- Ray Compensation Module: Reduces duplicated predictions by compensating features from adjacent frames
- Dynamic Loss Redesign: Removes the influence of dynamic objects from adjacent frames
- SOTA Performance: Achieves state-of-the-art results in RayIoU (+6.0)
- Python 3.7+
- PyTorch 1.8+
- CUDA 11.0+
- 8GB+ GPU memory
- CUDA toolkit for Gaussian Splatting compilation
- Clone the repository
git clone https://github.com/your-repo/GSRender.git
cd GSRender- Install dependencies
pip install -r requirements.txt- Install GSRender
pip install -e .- Install Gaussian Splatting dependencies (if not automatically installed)
# Install gsplat for 3D Gaussian Splatting rendering
pip install gsplat
# Install torch_scatter for sparse tensor operations
pip install torch-scatter
# Install torch_efficient_distloss for efficient distance loss
pip install torch-efficient-distlossNote: The
simple-knnpackage is included in the local CUDA implementation atmmdet3d/models/gs/cuda/simple-knnand will be installed automatically when you runpip install -e .. Make sure you have the correct CUDA version installed for compilation.
For detailed installation instructions, please refer to Installation Guide.
Please refer to Dataset Preparation Guide for preparing training and testing data.
# Train GSRender with 8 GPUs
./tools/dist_train.sh ./configs/gsrender/gsrender-7frame.py 8# Evaluate GSRender with 8 GPUs
./tools/dist_test.sh ./configs/gsrender/gsrender-7frame.py ./path/to/checkpoint.pth 8# Export predictions
bash tools/dist_test.sh configs/gsrender/gsrender-7frame.py gsrender-7frame-12e.pth 1 --dump_dir=work_dirs/output
# Visualize results (select scene-id)
python tools/visualization/visual.py work_dirs/output/scene-xxxxNote: The pkl file needs to be regenerated for visualization.
Results obtained using only 2D supervision are highlighted in bold.
| Method | GT | Backbone | Input Size | Epoch | RayIoU | RayIoU1m | RayIoU2m | RayIoU4m |
|---|---|---|---|---|---|---|---|---|
| FB-Occ (16f) (Li et al. 2023d) | 3D | R50 | 704Γ256 | 90 | 33.5 | 26.7 | 34.1 | 39.7 |
| SparseOcc(8f) (Tang et al. 2024b) | 3D | R50 | 704Γ256 | 24 | 34.0 | 28.0 | 34.7 | 39.4 |
| Panoptic-FlashOcc (1f) (Yu et al. 2024) | 3D | R50 | 704Γ256 | 24 | 35.2 | 29.4 | 36.0 | 40.1 |
| SimpleOcc (Gan et al. 2023) | 3D | R101 | 672Γ336 | 12 | 22.5 | 17.0 | 22.7 | 27.9 |
| OccNeRF (Zhang et al. 2023) | 2D* | R101 | 640Γ384 | 24 | 10.5 | 6.9 | 10.3 | 14.3 |
| GaussianOcc (Gan et al. 2024) | 2D* | R101 | 640Γ384 | 12 | 11.9 | 8.7 | 11.9 | 15.0 |
| RenderOcc(2f) | LiDAR-2D | Swin-B | 1408Γ512 | 12 | 19.3 | 12.7 | 19.3 | 25.9 |
| RenderOcc(7f) | LiDAR-2D | Swin-B | 1408Γ512 | 12 | 19.5 | 13.4 | 19.6 | 25.5 |
| GSRender(2f) | LiDAR-2D | Swin-B | 1408Γ512 | 12 | 25.5(+6.0) | 18.7(+5.3) | 25.8(+6.2) | 31.8(+6.3) |
Note: GSRender achieves significant improvements over previous 2D-supervised methods, with +6.0 points improvement in RayIoU metric.
More model weights will be released later.
- 3D Gaussian Splatting Head: Efficient sampling and rendering
- Ray Compensation Module: Reduces duplicated predictions
- Dynamic Loss: Handles dynamic objects from adjacent frames
- Volume Rendering: NeRF-style 3D volume representation
GSRender/
βββ configs/ # Configuration files
β βββ gsrender/ # GSRender configurations
β βββ _base_/ # Base configurations
βββ docs/ # Documentation
β βββ install.md # Installation guide
β βββ prepare_datasets.md # Dataset preparation
βββ mmdet3d/ # Core code
β βββ apis/ # API interfaces
β βββ core/ # Core functionality
β βββ datasets/ # Dataset processing
β βββ models/ # Model definitions
β β βββ detectors/ # Detector models
β β β βββ gsrender.py # GSRender implementation
β β βββ gs/ # Gaussian Splatting modules
β βββ ops/ # Custom operations
β βββ utils/ # Utility functions
βββ tools/ # Training and testing tools
β βββ dist_train.sh # Distributed training script
β βββ dist_test.sh # Distributed testing script
β βββ train.py # Training script
β βββ test.py # Testing script
β βββ visualization/ # Visualization tools
βββ lib/ # Custom libraries
β βββ dvr/ # Differentiable volume rendering
βββ assets/ # Resource files
βββ requirements.txt # Dependencies
βββ setup.py # Installation configuration
βββ README.md # Project documentation
GSRender employs 3D Gaussian Splatting to simplify the sampling process, addressing the efficiency-accuracy trade-off in NeRF-based methods. The implementation uses:
- gsplat: For efficient 3D Gaussian rendering and rasterization
- simple-knn: For efficient nearest neighbor search in 3D space
- torch_scatter: For sparse tensor operations in Gaussian processing
- torch_efficient_distloss: For efficient distance-based loss computation
Our novel Ray Compensation module compensates for features from adjacent frames, significantly reducing duplicated predictions that plague existing approaches.
We redesign the dynamic loss to properly handle dynamic objects from adjacent frames, improving the overall performance and robustness.
We thank the following excellent open-source projects:
- BEVDet - BEV detection framework
- DVGO - Direct voxel grid optimization
- Occ3D - 3D occupancy prediction
- SurroundDepth - Surrounding depth estimation
- OpenOccupancy - Open occupancy prediction
- CVPR2023-Occ-Challenge - CVPR2023 occupancy prediction challenge
- RenderOcc - Vision-centric 3D occupancy prediction with 2D rendering supervision
- SurroundOcc - Surrounding occupancy prediction
- TPVFormer - Tri-perspective view transformer
- BEVFormer - BEV transformer
- VoxFormer - Voxel transformer
- FB-Occ - Forward-backward occupancy
- SimpleOccupancy - Simple occupancy prediction
- OVO - Open vocabulary occupancy
If this work is helpful for your research, please consider citing:
@article{Sun2024GSRenderDO,
title={GSRender: Deduplicated Occupancy Prediction via Weakly Supervised 3D Gaussian Splatting},
author={Qianpu Sun and Changyong Shu and Sifan Zhou and Zichen Yu and Yan Chen and Dawei Yang and Yuan Chun},
journal={ArXiv},
year={2024},
volume={abs/2412.14579},
url={https://api.semanticscholar.org/CorpusID:274859862}
}If you have any questions or suggestions, please contact us via:
- Submit an Issue
- Email: qianpusun@outlook.com
This project is licensed under the MIT License.
β If this project is helpful for your research, please give us a star!
