release evaluation code

Junyi42 · Oct 20, 2024 · 6117b33 · 6117b33
1 parent 649195a
commit 6117b33
Show file tree

Hide file tree

Showing 14 changed files with 754 additions and 349 deletions.
diff --git a/README.md b/README.md
@@ -18,12 +18,12 @@ Arxiv, 2024. [**[Project Page]**](https://monst3r-project.github.io/) [**[Paper]
 [![Watch the video](assets/fig1_teaser.png)](https://monst3r-project.github.io/files/teaser_vid_v2_lowres.mp4)
 
 ## TODO
-- [x] Release model weights on [Google Drive](https://drive.google.com/file/d/1Z1jO_JmfZj0z3bgMvCwqfUhyZ1bIbc9E/view?usp=sharing) and [Hugging Face](https://huggingface.co/Junyi42/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt)
+- [x] Release model weights on [Google Drive](https://drive.google.com/file/d/1Z1jO_JmfZj0z3bgMvCwqfUhyZ1bIbc9E/view?usp=sharing) and [Hugging Face](https://huggingface.co/Junyi42/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt) (10/07)
 - [x] Release inference code for global optimization (10/18)
 - [x] Release 4D visualization code (10/18)
 - [x] Release training code & dataset preparation (10/19)
-- [ ] Release evaluation code (est. time: 10/21)
-- [ ] Gradio Demo (est. time: 10/28)
+- [x] Release evaluation code (10/20)
+- [ ] Gradio Demo
 
 ## Getting Started
 
@@ -102,9 +102,34 @@ python viser/visualizer_monst3r.py --data demo_tmp/lady-running
 # to remove the floaters of foreground: --init_conf --fg_conf_thre 1.0 (thre can be adjusted)
 ```
 
-### Training
+## Evaluation
 
-First, please refer to the [prepare_training.md](data/prepare_training.md) for preparing the pretrained models and training/evaluation datasets.
+We provide here an example of joint dense reconstruction and camera pose estimation on the **DAVIS** dataset. 
+
+First, download the dataset:
+```bash
+cd data; python download_davis.py; cd ..
+```
+
+Then, run the evaluation script:
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=davis --output_dir="results/davis_joint" 
+    # To use the ground truth dynamic mask, add: --use_gt_mask
+```
+
+You could then use the `viser` to visualize the results:
+```bash
+python viser/visualizer_monst3r.py --data results/davis_joint/bear
+```
+
+#### For the complete scripts to evaluate the camera pose / video depth / single-frame depth estimation on the **Sintel**, **Bonn**, **KITTI**, **NYU-v2**, **TUM-dynamics**, **ScanNet**, and **DAVIS** datasets. Please refer to the [evaluation_script.md](data/evaluation_script.md) for more details.
+
+
+## Training
+
+Please refer to the [prepare_training.md](data/prepare_training.md) for preparing the pretrained models and training/testing datasets.
 
 Then, you can train the model using the following command:
 ```bash
@@ -133,4 +158,4 @@ If you find our work useful, please cite:
 ```
 
 ## Acknowledgements
-Our code is based on [DUSt3R](https://github.com/naver/dust3r) and [CasualSAM](https://github.com/ztzhang/casualSAM), our camera pose estimation evaluation script is based on [LEAP-VO](https://github.com/chiaki530/leapvo), and our visualization code is based on [Viser](https://github.com/nerfstudio-project/viser). We thank the authors for their excellent work!
+Our code is based on [DUSt3R](https://github.com/naver/dust3r) and [CasualSAM](https://github.com/ztzhang/casualSAM), our camera pose estimation evaluation script is based on [LEAP-VO](https://github.com/chiaki530/leapvo), and our visualization code is based on [Viser](https://github.com/nerfstudio-project/viser). We thank the authors for their excellent work!
diff --git a/data/download_sintel.sh b/data/download_sintel.sh
@@ -17,4 +17,3 @@ cd ..
 # conda activate monst3r
 # cd ..
 # python datasets_preprocess/sintel_get_dynamics.py --threshold 0.1 --save_dir dynamic_label_perfect
-# python datasets_preprocess/sintel_get_dynamics.py --continuous --save_dir dynamic_label_continuous
diff --git a/data/evaluation_script.md b/data/evaluation_script.md
@@ -0,0 +1,171 @@
+# Dataset Preparation for Evaluation
+
+We provide scripts to download and prepare the datasets for evaluation. The datasets include: **Sintel**, **Bonn**, **KITTI**, **NYU-v2**, **TUM-dynamics**, **ScanNetv2**, and **DAVIS**.
+
+> [!NOTE]
+> The scripts provided here are for reference only. Please ensure you have obtained the necessary licenses from the original dataset providers before proceeding.
+
+
+## Download Datasets
+
+### Sintel
+To download and prepare the **Sintel** dataset, execute:
+```bash
+cd data
+bash download_sintel.sh
+cd ..
+
+# (optional) generate the GT dynamic mask
+cd ..
+python datasets_preprocess/sintel_get_dynamics.py --threshold 0.1 --save_dir dynamic_label_perfect 
+```
+
+### Bonn
+To download and prepare the **Bonn** dataset, execute:
+```bash
+cd data
+bash download_bonn.sh
+cd ..
+
+# create the subset for video depth evaluation, following depthcrafter
+cd datasets_preprocess
+python prepare_bonn.py
+cd ..
+```
+
+### KITTI
+To download and prepare the **KITTI** dataset, execute:
+```bash
+cd data
+bash download_kitti.sh
+cd ..
+
+# create the subset for video depth evaluation, following depthcrafter
+cd datasets_preprocess
+python prepare_kitti.py
+cd ..
+```
+
+### NYU-v2
+To download and prepare the **NYU-v2** dataset, execute:
+```bash
+cd data
+bash download_nyuv2.sh
+cd ..
+
+# prepare the dataset for depth evaluation
+cd datasets_preprocess
+python prepare_nyuv2.py
+cd ..
+```
+
+### TUM-dynamics
+To download and prepare the **TUM-dynamics** dataset, execute:
+```bash
+cd data
+bash download_tum.sh
+cd ..
+
+# prepare the dataset for pose evaluation
+cd datasets_preprocess
+python prepare_tum.py
+cd ..
+```
+
+### ScanNet
+To download and prepare the **ScanNet** dataset, execute:
+```bash
+cd data
+bash download_scannetv2.sh
+cd ..
+
+# prepare the dataset for pose evaluation
+cd datasets_preprocess
+python prepare_scannet.py
+cd ..
+```
+
+### DAVIS
+To download and prepare the **DAVIS** dataset, execute:
+```bash
+cd data
+python download_davis.py
+cd ..
+```
+
+## Evaluation Script (Video Depth)
+
+### Sintel
+
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=sintel --output_dir="results/sintel_video_depth" --full_seq
+```
+
+The results will be saved in the `results/sintel_video_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results.
+
+### Bonn
+
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=bonn --output_dir="results/bonn_video_depth"
+```
+
+The results will be saved in the `results/bonn_video_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results.
+
+### KITTI
+
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=kitti --output_dir="results/kitti_video_depth"
+```
+
+The results will be saved in the `results/kitti_video_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results.
+
+## Evaluation Script (Camera Pose)
+
+### Sintel
+
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=sintel --output_dir="results/sintel_pose"
+    # To use the ground truth dynamic mask, add: --use_gt_mask
+```
+
+The evaluation results will be saved in `results/sintel_pose/_error_log.txt`. 
+
+### TUM-dynamics
+
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=tum --output_dir="results/tum_pose"
+```
+
+The evaluation results will be saved in `results/tum_pose/_error_log.txt`.
+
+### ScanNet
+
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=scannet --output_dir="results/scannet_pose"
+```
+
+The evaluation results will be saved in `results/scannet_pose/_error_log.txt`.
+
+## Evaluation Script (Single-Frame Depth)
+
+### NYU-v2
+
+```bash
+CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_depth  \
+    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
+    --eval_dataset=nyu --output_dir="results/nyuv2_depth"
+```
+
+The results will be saved in the `results/nyuv2_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results.
diff --git a/data/prepare_training.md b/data/prepare_training.md
@@ -3,7 +3,8 @@
 
 We provide scripts to prepare datasets for training, including **PointOdyssey**, **TartanAir**, **Spring**, and **Waymo**. For evaluation, we also provide a script for preparing the **Sintel** dataset.  
 
-*Please ensure you have obtained the necessary licenses from the original dataset providers before proceeding.*
+> [!NOTE]
+> The scripts provided here are for reference only. Please ensure you have obtained the necessary licenses from the original dataset providers before proceeding.
 
 ## Download Pre-Trained Models
 To download the pre-trained models, run the following commands:
@@ -69,4 +70,4 @@ To download and prepare the **Sintel** dataset for evaluation, execute:
 cd data
 bash download_sintel.sh
 cd ..
-```
+```
diff --git a/datasets_preprocess/bonn.ipynb b/datasets_preprocess/bonn.ipynb