Mutian Xu
·
Xingyilang Yin
·
Lingteng Qiu
·
Yang Liu
·
Xin Tong
·
Xiaoguang Han
SSE, CUHKSZ
·
FNii, CUHKSZ
·
Microsoft Research Asia
SAMPro3D can segment ANY 😯😯😯 3D indoor scenes WITHOUT training ❗️❗️❗️. It achieves higher quality and more diverse segmentation than previous zero-shot or fully supervised approaches, and in many cases even surpasses human-level annotations.
If you find our code or work helpful, please cite:
@article{xu2023sampro3d,
title={SAMPro3D: Locating SAM Prompts in 3D for Zero-Shot Scene Segmentation},
author={Mutian Xu and Xingyilang Yin and Lingteng Qiu and Yang Liu and Xin Tong and Xiaoguang Han},
year={2023},
journal = {arXiv preprint arXiv:2311.17707}
}
Table of Contents
- The initial code is released. 🔥🔥🔥 (Dec.31, 2023 UTC)
- The first major revision of code is out. Try the latest code! 💪💪💪 (Jan.2, 2024 UTC)
At least 1 GPU to hold around 8000MB. Moreover, it is highly recommended to utilize both a CPU with ample processing power and a disk with fast I/O capabilities. Additionally, the disk needs to be large enough (about 50 MB for a 2D frame of resolution 240*320, totally around 160 GB for 2500 frames of a large-scale scene).
Follow the installation instruction to install all required packages.
Follow the data pre-processing instruction to download and preprocess data.
The initial stage of SAMPro3D involves generating a 3D prompt and executing SAM segmentation, followed by saving the SAM outputs for subsequent stages. To initiate this process, simply run:
python 3d_prompt_proposal.py --data_path /PATH_TO/ScanNet_data --scene_name sceneXXXX_XX --prompt_path /PATH_TO/initial_prompt --sam_output_path /PATH_TO/SAM_outputs --device cuda:0
This stage will be the only step to perform SAM inference, accounting for the majority of computational time and memory usage within our entire pipeline.
Note on time efficiency: This stage will save SAM outputs into .npy
files for later use. Due to different hardware conditions (CPU and disk), the I/O speed of SAM output files may vary a lot and impact the running time of our pipeline. Please refer to the hardware recommendations mentioned before to prepare your hardware for the best efficiency.
(Optional: Partial-Area Segmentation): At this stage, you can also perform 3D segmentation on partial point clouds captured by limited 2D frames, by simply changing the frame_id_init
and frame_id_end
at here, then running the script. Sometimes this works better than segmenting the whole point clouds (thanks to less complicated scenes and better frame-consistency).
Next, we will proceed with filtering and consolidating the initial prompts, leveraging the saved SAM outputs generated during the 3D Prompt Proposal phase, to obtain the final 3D segmentations. This can be realized by executing the following command:
python main.py --data_path /PATH_TO/ScanNet_data --scene_name sceneXXXX_XX --prompt_path /PATH_TO/initial_prompt --sam_output_path /PATH_TO/SAM_outputs --pred_path /PATH_TO/sampro3d_predictions --output_vis_path /PATH_TO/result_visualization --device cuda:0
After finishing this, the visualization result of the final 3D segmentation will be automatically 😊 saved as sceneXXXX_XX.ply
file in the path specified by --output_vis_path
.
Note on post-processing of the floor: Using our framework, you can usually get a decent segmentation of the floor. However, for a large-scale floor, we use post-processing for perfect segmentation of the floor. For small-scale scenes (e.g., scene0050_00 in ScanNet), you can skip this step by simply adding --args.post_floor False
to the previous command.
If everything goes well, the entire pipeline will just take 15 min for a large-scale 3D scene captured by 2000 2D frames. (WE DO NOT NEED TRAIN❗️)
comparison_sam3d.mp4
comparison_mask3d.mp4
With our advanced framework, you can generate high-quality segmentations on your own 3D scene without the need for training! Here are the steps you can follow:
- Data preparation: Follow the tips mentioned in data pre-processing instruction to prepare your data.
- Familiarize yourself with the instructions: Read and understand the guidelines, and pay attention to any specific recommendations.
- Run SAMPro3D: Execute the segmentation framework on your prepared data. You may also need to adjust some parameters such as eps at here according to the code comments.
- Monitor the segmentation process: Keep an eye on the segmentation process to ensure it is running smoothly. Depending on the size and complexity of your scene, the segmentation may take some time to complete.
- Evaluate the segmentation output: Once the segmentation process is finished, assess the quality of the segmentation results. Check if the segments align with the desired objects or regions in your 3D scene. You may also compare the output to ground truth data or visually inspect the results for accuracy.
- Refine if necessary: If the segmentation output requires improvement or refinement, consider adjusting the parameters or settings of SAMPro3D or applying post-processing techniques to enhance the segmentation quality.
- Analyze and utilize the segmentation: Utilize the segmented output for your intended purposes, such as further analysis, visualization, or integration with other applications or systems.
- Add the visualization code for showing the result of SAM3D, Mask3D and ScanNet200's annotations.
- Add the evaluation code for calculating segmentation mIoU.
- Add the code for incorporating HQ-SAM and Mobile-SAM in our pipeline.
- Support the jupyter notebook for step-by-step running.
- Support in-website qualitative visualization.
- Support more datasets.
You are welcome to submit issues, send pull requests, or share some ideas with us. If you have any other questions, please contact Mutian Xu (mutianxu@link.cuhk.edu.cn).
Our code base is partially borrowed or adapted from SAM, OpenScene and Pointcept.