POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference

Zhiwen Fan^1,*, Panwang Pan^2,*, Peihao Wang¹, Yifan Jiang¹,
Dejia Xu¹, Hanwen Jiang¹, Zhangyang Wang¹

¹The University of Texas at Austin ²ByteDance ^*denotes equal contribution

Welcome to the project repository for POPE (Promptable Pose Estimation), a state-of-the-art technique for 6-DoF pose estimation of any object in any scene using a single reference.

Preparation

Installation

Docker setup

Please check docker/README.MD

OR you can follow the steps below:

The code is tested with python 3.9, cuda == 11.3, pytorch == 1.10.1. Additionally dependencies include:

h5py
kornia
torch
torchvision
omegaconf
torchmetrics==0.10.3
fvcore
iopath
submitit
pathlib
transforms3d
numpy
plyfile
easydict
scikit-image
matplotlib
pyyaml
tabulate
numpy
tqdm
loguru
opencv-python
--extra-index-url https://pypi.nvidia.com

pip3 install -r ./requirements.txt

Download model checkpoints

download SegmentAnything Model to weights

wget   https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth  -O weights/sam_vit_h_4b8939.pth

download DINOv2 Model to weights

wget  https://dl.fbaipublicfiles.com/dinov2/dinov2_vits14/dinov2_vits14_pretrain.pth -O   weights/dinov2_vits14.pth

Prepare datasets (Updated dataset download links)

Download datasets from the Hugging Face Website: download OnePose/OnePose_LowTexture datasets from here YCB-Video and LINEMOD dataset from here, and extract them into ./data.

~~If you want to evaluate on LINEMOD dataset, download the real training data, test data and 3D object models from CDPN, and detection results by YOLOv5 from here. Then extract them into ./data~~

The directory should be organized in the following structure:

    |--📂data
    |       |--- 📂ycbv
    |       |--- 📂OnePose_LowTexture
    |       |--- 📂demos
    |       |--- 📂onepose
    |       |--- 📂LM_dataset
    |       |      |--- 📂bbox_2d
    |       |      |--- 📂corlor
    |       |      |--- 📂color_full
    |       |      |--- 📂intrin
    |       |      |--- 📂intrin_ba
    |       |      |--- 📂poses_ba
    |       |      |--- 📜box3d_corners.txt

Demos

Thank you for your attention, and I apologize for the excessive use of hard-coded values in the code. We have now organized the code structure and README to make it more user-friendly.

The code has been recently tidied up for release and could perhaps contain tiny bugs. Please feel free to open an issue.

bash demo.sh
# Demo1: visual DINOv2 feature
python3 visual_dinov2.py

# Demo2: visual Segment Anything Model
python3 visual_sam.py
# Demo2: visual 3D BBox
python3 visual_3dbbox.py

Evaluation

python3 eval_linemod_json.py
python3 eval_onepose_json.py
python3 eval_ycb_json.py

Zero-shot Promtable Pose Estimation

Some Visual Examples of Promptable Object Pose Estimation Test Cases on Outdoor, indoor and scene with severe occlutions.

We also conduct a more challenging evaluation using an edge map as the reference, which further demonstrates the robustness of POPE(DINOv2 and Matcher).

Application on Novel View Synthesis

We show the Application of Novel View Synthesis, by leveraging the estimated object poses, our method generate photo-realistic rendering results. we employ the estimated multi-view poses obtained from our POPE model, in combi nation with a pre-trained and generalizable Neural Radiance Field (GNT and Render)

Comparison based on Video and Image

We show Visualizations on LINEMOD, YCB-Video, OnePose and OnePose++ datasets, with the comparison with LoFTR and Gen6D.

Citation

If you find this repo is helpful, please consider citing:

@article{fan2023pope,
  title={POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference},
  author={Fan, Zhiwen and Pan, Panwang and Wang, Peihao and Jiang, Yifan and Xu, Dejia and Jiang, Hanwen and Wang, Zhangyang},
  journal={arXiv preprint arXiv:2305.15727},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference

Preparation

Installation

Docker setup

Download model checkpoints

Prepare datasets (Updated dataset download links)

Demos

Evaluation

Zero-shot Promtable Pose Estimation

Application on Novel View Synthesis

Comparison based on Video and Image

Citation

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
data		data
dinov2		dinov2
docker		docker
docs		docs
models		models
scripts		scripts
segment_anything		segment_anything
src		src
utils		utils
weights		weights
.gitignore		.gitignore
README.md		README.md
demo.sh		demo.sh
eval_linemod_json.py		eval_linemod_json.py
eval_onepose_json.py		eval_onepose_json.py
eval_ycb_json.py		eval_ycb_json.py
pope_model_api.py		pope_model_api.py
requirements.txt		requirements.txt
visual_3dbbox.py		visual_3dbbox.py
visual_dinov2.py		visual_dinov2.py
visual_sam.py		visual_sam.py

paulpanwang/POPE

Folders and files

Latest commit

History

Repository files navigation

POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with One Reference

Preparation

Installation

Docker setup

Download model checkpoints

Prepare datasets (Updated dataset download links)

Demos

Evaluation

Zero-shot Promtable Pose Estimation

Application on Novel View Synthesis

Comparison based on Video and Image

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages