diff --git a/README.md b/README.md index e6e4706..504fe2c 100644 --- a/README.md +++ b/README.md @@ -18,12 +18,12 @@ Arxiv, 2024. [**[Project Page]**](https://monst3r-project.github.io/) [**[Paper] [![Watch the video](assets/fig1_teaser.png)](https://monst3r-project.github.io/files/teaser_vid_v2_lowres.mp4) ## TODO -- [x] Release model weights on [Google Drive](https://drive.google.com/file/d/1Z1jO_JmfZj0z3bgMvCwqfUhyZ1bIbc9E/view?usp=sharing) and [Hugging Face](https://huggingface.co/Junyi42/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt) +- [x] Release model weights on [Google Drive](https://drive.google.com/file/d/1Z1jO_JmfZj0z3bgMvCwqfUhyZ1bIbc9E/view?usp=sharing) and [Hugging Face](https://huggingface.co/Junyi42/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt) (10/07) - [x] Release inference code for global optimization (10/18) - [x] Release 4D visualization code (10/18) - [x] Release training code & dataset preparation (10/19) -- [ ] Release evaluation code (est. time: 10/21) -- [ ] Gradio Demo (est. time: 10/28) +- [x] Release evaluation code (10/20) +- [ ] Gradio Demo ## Getting Started @@ -102,9 +102,34 @@ python viser/visualizer_monst3r.py --data demo_tmp/lady-running # to remove the floaters of foreground: --init_conf --fg_conf_thre 1.0 (thre can be adjusted) ``` -### Training +## Evaluation -First, please refer to the [prepare_training.md](data/prepare_training.md) for preparing the pretrained models and training/evaluation datasets. +We provide here an example of joint dense reconstruction and camera pose estimation on the **DAVIS** dataset. + +First, download the dataset: +```bash +cd data; python download_davis.py; cd .. +``` + +Then, run the evaluation script: +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=davis --output_dir="results/davis_joint" + # To use the ground truth dynamic mask, add: --use_gt_mask +``` + +You could then use the `viser` to visualize the results: +```bash +python viser/visualizer_monst3r.py --data results/davis_joint/bear +``` + +#### For the complete scripts to evaluate the camera pose / video depth / single-frame depth estimation on the **Sintel**, **Bonn**, **KITTI**, **NYU-v2**, **TUM-dynamics**, **ScanNet**, and **DAVIS** datasets. Please refer to the [evaluation_script.md](data/evaluation_script.md) for more details. + + +## Training + +Please refer to the [prepare_training.md](data/prepare_training.md) for preparing the pretrained models and training/testing datasets. Then, you can train the model using the following command: ```bash @@ -133,4 +158,4 @@ If you find our work useful, please cite: ``` ## Acknowledgements -Our code is based on [DUSt3R](https://github.com/naver/dust3r) and [CasualSAM](https://github.com/ztzhang/casualSAM), our camera pose estimation evaluation script is based on [LEAP-VO](https://github.com/chiaki530/leapvo), and our visualization code is based on [Viser](https://github.com/nerfstudio-project/viser). We thank the authors for their excellent work! \ No newline at end of file +Our code is based on [DUSt3R](https://github.com/naver/dust3r) and [CasualSAM](https://github.com/ztzhang/casualSAM), our camera pose estimation evaluation script is based on [LEAP-VO](https://github.com/chiaki530/leapvo), and our visualization code is based on [Viser](https://github.com/nerfstudio-project/viser). We thank the authors for their excellent work! diff --git a/data/download_sintel.sh b/data/download_sintel.sh index 82e6bf4..fbeefdb 100644 --- a/data/download_sintel.sh +++ b/data/download_sintel.sh @@ -17,4 +17,3 @@ cd .. # conda activate monst3r # cd .. # python datasets_preprocess/sintel_get_dynamics.py --threshold 0.1 --save_dir dynamic_label_perfect -# python datasets_preprocess/sintel_get_dynamics.py --continuous --save_dir dynamic_label_continuous diff --git a/data/evaluation_script.md b/data/evaluation_script.md new file mode 100644 index 0000000..5fc9ca4 --- /dev/null +++ b/data/evaluation_script.md @@ -0,0 +1,171 @@ +# Dataset Preparation for Evaluation + +We provide scripts to download and prepare the datasets for evaluation. The datasets include: **Sintel**, **Bonn**, **KITTI**, **NYU-v2**, **TUM-dynamics**, **ScanNetv2**, and **DAVIS**. + +> [!NOTE] +> The scripts provided here are for reference only. Please ensure you have obtained the necessary licenses from the original dataset providers before proceeding. + + +## Download Datasets + +### Sintel +To download and prepare the **Sintel** dataset, execute: +```bash +cd data +bash download_sintel.sh +cd .. + +# (optional) generate the GT dynamic mask +cd .. +python datasets_preprocess/sintel_get_dynamics.py --threshold 0.1 --save_dir dynamic_label_perfect +``` + +### Bonn +To download and prepare the **Bonn** dataset, execute: +```bash +cd data +bash download_bonn.sh +cd .. + +# create the subset for video depth evaluation, following depthcrafter +cd datasets_preprocess +python prepare_bonn.py +cd .. +``` + +### KITTI +To download and prepare the **KITTI** dataset, execute: +```bash +cd data +bash download_kitti.sh +cd .. + +# create the subset for video depth evaluation, following depthcrafter +cd datasets_preprocess +python prepare_kitti.py +cd .. +``` + +### NYU-v2 +To download and prepare the **NYU-v2** dataset, execute: +```bash +cd data +bash download_nyuv2.sh +cd .. + +# prepare the dataset for depth evaluation +cd datasets_preprocess +python prepare_nyuv2.py +cd .. +``` + +### TUM-dynamics +To download and prepare the **TUM-dynamics** dataset, execute: +```bash +cd data +bash download_tum.sh +cd .. + +# prepare the dataset for pose evaluation +cd datasets_preprocess +python prepare_tum.py +cd .. +``` + +### ScanNet +To download and prepare the **ScanNet** dataset, execute: +```bash +cd data +bash download_scannetv2.sh +cd .. + +# prepare the dataset for pose evaluation +cd datasets_preprocess +python prepare_scannet.py +cd .. +``` + +### DAVIS +To download and prepare the **DAVIS** dataset, execute: +```bash +cd data +python download_davis.py +cd .. +``` + +## Evaluation Script (Video Depth) + +### Sintel + +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=sintel --output_dir="results/sintel_video_depth" --full_seq +``` + +The results will be saved in the `results/sintel_video_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results. + +### Bonn + +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=bonn --output_dir="results/bonn_video_depth" +``` + +The results will be saved in the `results/bonn_video_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results. + +### KITTI + +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=kitti --output_dir="results/kitti_video_depth" +``` + +The results will be saved in the `results/kitti_video_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results. + +## Evaluation Script (Camera Pose) + +### Sintel + +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=sintel --output_dir="results/sintel_pose" + # To use the ground truth dynamic mask, add: --use_gt_mask +``` + +The evaluation results will be saved in `results/sintel_pose/_error_log.txt`. + +### TUM-dynamics + +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=tum --output_dir="results/tum_pose" +``` + +The evaluation results will be saved in `results/tum_pose/_error_log.txt`. + +### ScanNet + +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_pose \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=scannet --output_dir="results/scannet_pose" +``` + +The evaluation results will be saved in `results/scannet_pose/_error_log.txt`. + +## Evaluation Script (Single-Frame Depth) + +### NYU-v2 + +```bash +CUDA_VISIBLE_DEVICES=0 torchrun --nproc_per_node=1 --master_port=29604 launch.py --mode=eval_depth \ + --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \ + --eval_dataset=nyu --output_dir="results/nyuv2_depth" +``` + +The results will be saved in the `results/nyuv2_depth` folder. You could then run the corresponding code block in [depth_metric.ipynb](../depth_metric.ipynb) to evaluate the results. \ No newline at end of file diff --git a/data/prepare_training.md b/data/prepare_training.md index 490703d..cdc667f 100644 --- a/data/prepare_training.md +++ b/data/prepare_training.md @@ -3,7 +3,8 @@ We provide scripts to prepare datasets for training, including **PointOdyssey**, **TartanAir**, **Spring**, and **Waymo**. For evaluation, we also provide a script for preparing the **Sintel** dataset. -*Please ensure you have obtained the necessary licenses from the original dataset providers before proceeding.* +> [!NOTE] +> The scripts provided here are for reference only. Please ensure you have obtained the necessary licenses from the original dataset providers before proceeding. ## Download Pre-Trained Models To download the pre-trained models, run the following commands: @@ -69,4 +70,4 @@ To download and prepare the **Sintel** dataset for evaluation, execute: cd data bash download_sintel.sh cd .. -``` \ No newline at end of file +``` diff --git a/datasets_preprocess/bonn.ipynb b/datasets_preprocess/bonn.ipynb deleted file mode 100644 index d9331d6..0000000 --- a/datasets_preprocess/bonn.ipynb +++ /dev/null @@ -1,69 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "import glob\n", - "import os\n", - "import shutil\n", - "dirs = glob.glob(\"../data/bonn/rgbd_bonn_dataset/*/\")\n", - "dirs = sorted(dirs)\n", - "# extract frames\n", - "for dir in dirs:\n", - " frames = glob.glob(dir + 'rgb/*.png')\n", - " frames = sorted(frames)\n", - " # sample 110 frames at the stride of 2\n", - " frames = frames[30:140]\n", - " # cut frames after 110\n", - " new_dir = dir + 'rgb_110/'\n", - "\n", - " for frame in frames:\n", - " os.makedirs(new_dir, exist_ok=True)\n", - " shutil.copy(frame, new_dir)\n", - " # print(f'cp {frame} {new_dir}')\n", - "\n", - " depth_frames = glob.glob(dir + 'depth/*.png')\n", - " depth_frames = sorted(depth_frames)\n", - " # sample 110 frames at the stride of 2\n", - " depth_frames = depth_frames[30:140]\n", - " # cut frames after 110\n", - " new_dir = dir + 'depth_110/'\n", - "\n", - " for frame in depth_frames:\n", - " os.makedirs(new_dir, exist_ok=True)\n", - " shutil.copy(frame, new_dir)\n", - " # print(f'cp {frame} {new_dir}')\n", - "import numpy as np\n", - "for dir in dirs:\n", - " gt_path = \"groundtruth.txt\"\n", - " gt = np.loadtxt(dir + gt_path)\n", - " gt_110 = gt[30:140]\n", - " np.savetxt(dir + 'groundtruth_110.txt', gt_110)\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "dust3r", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/datasets_preprocess/kitti.ipynb b/datasets_preprocess/kitti.ipynb deleted file mode 100644 index 77c08d0..0000000 --- a/datasets_preprocess/kitti.ipynb +++ /dev/null @@ -1,82 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "#!/usr/bin/python\n", - "\n", - "from PIL import Image\n", - "import numpy as np\n", - "\n", - "\n", - "def depth_read(filename):\n", - " # loads depth map D from png file\n", - " # and returns it as a numpy array,\n", - " # for details see readme.txt\n", - "\n", - " depth_png = np.array(Image.open(filename), dtype=int)\n", - " # make sure we have a proper 16bit depth map here.. not 8bit!\n", - " assert(np.max(depth_png) > 255)\n", - "\n", - " depth = depth_png.astype(np.float) / 256.\n", - " depth[depth_png == 0] = -1.\n", - " return depth" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import glob\n", - "import os\n", - "import shutil\n", - "depth_dirs = glob.glob(\"../data/kitti/val/*/proj_depth/groundtruth/image_02\")\n", - "for dir in depth_dirs:\n", - " # new depth dir\n", - " new_depth_dir = \"../data/kitti/depth_selection/val_selection_cropped/groundtruth_depth_gathered/\" + dir.split(\"/\")[-4]+\"_02\"\n", - " # print(new_depth_dir)\n", - " new_image_dir = \"../data/kitti/depth_selection/val_selection_cropped/image_gathered/\" + dir.split(\"/\")[-4]+\"_02\"\n", - " os.makedirs(new_depth_dir, exist_ok=True)\n", - " os.makedirs(new_image_dir, exist_ok=True)\n", - " for depth_file in sorted(glob.glob(dir + \"/*.png\"))[:110]: #../data/kitti/val/2011_09_26_drive_0002_sync/proj_depth/groundtruth/image_02/0000000005.png\n", - " new_path = new_depth_dir + \"/\" + depth_file.split(\"/\")[-1]\n", - " shutil.copy(depth_file, new_path)\n", - " # get the path of the corresponding image\n", - " image_file = depth_file.replace('val','raw').replace('proj_depth/groundtruth/image_02', 'image_02/data')\n", - " print(image_file)\n", - " # check if the image file exists\n", - " if os.path.exists(image_file):\n", - " new_path = new_image_dir + \"/\" + image_file.split(\"/\")[-1]\n", - " shutil.copy(image_file, new_path)\n", - " else:\n", - " print(\"Image file does not exist: \", image_file)" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "dust3r", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/datasets_preprocess/nyu-v2.ipynb b/datasets_preprocess/nyu-v2.ipynb deleted file mode 100644 index ff4e4bc..0000000 --- a/datasets_preprocess/nyu-v2.ipynb +++ /dev/null @@ -1,118 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import h5py\n", - "import numpy as np\n", - "import os\n", - "from glob import glob\n", - "from PIL import Image\n", - "\n", - "# Set the path to your dataset directory\n", - "dataset_dir = '../data/nyu-v2/val/official/'\n", - "\n", - "# Get a list of all .h5 files in the dataset directory\n", - "file_paths = glob(os.path.join(dataset_dir, '*.h5'))\n", - "\n", - "# Create output directories for images and depth data\n", - "output_image_dir = '../data/nyu-v2/val/nyu_images/'\n", - "output_depth_dir = '../data/nyu-v2/val/nyu_depths/'\n", - "os.makedirs(output_image_dir, exist_ok=True)\n", - "os.makedirs(output_depth_dir, exist_ok=True)\n", - "\n", - "for file_path in file_paths:\n", - " with h5py.File(file_path, 'r') as h5file:\n", - " # Read depth and rgb data\n", - " depth_data = h5file['depth'][:]\n", - " rgb_data = h5file['rgb'][:]\n", - " \n", - " # Convert rgb data from (3, H, W) to (H, W, 3)\n", - " rgb_data = np.transpose(rgb_data, (1, 2, 0))\n", - " \n", - " # Ensure that rgb_data is of type uint8\n", - " if rgb_data.dtype != np.uint8:\n", - " rgb_data = rgb_data.astype(np.uint8)\n", - " \n", - " # Get the base filename without extension\n", - " base_name = os.path.splitext(os.path.basename(file_path))[0]\n", - " \n", - " # Save the RGB image as PNG\n", - " rgb_image = Image.fromarray(rgb_data)\n", - " rgb_image.save(os.path.join(output_image_dir, f'{base_name}.png'))\n", - " \n", - " # Save the depth data as NPY file\n", - " np.save(os.path.join(output_depth_dir, f'{base_name}.npy'), depth_data)\n", - " \n", - " print(f'Processed {base_name}')\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import os\n", - "import numpy as np\n", - "from PIL import Image\n", - "\n", - "# Paths\n", - "depth_npy_dir = '../data/nyu-v2/val/nyu_depths'\n", - "output_img_dir = '../data/nyu-v2/val/nyu_depth_imgs'\n", - "\n", - "# Ensure the output directory exists\n", - "os.makedirs(output_img_dir, exist_ok=True)\n", - "\n", - "# Iterate over all .npy files in the depth directory\n", - "for npy_file in os.listdir(depth_npy_dir):\n", - " if npy_file.endswith('.npy'):\n", - " # Load depth data from .npy file\n", - " depth_path = os.path.join(depth_npy_dir, npy_file)\n", - " depth_data = np.load(depth_path)\n", - " \n", - " # Normalize depth data to range [0, 255] for saving as an image\n", - " depth_min = depth_data.min()\n", - " depth_max = depth_data.max()\n", - " depth_normalized = (depth_data - depth_min) / (depth_max - depth_min)\n", - " depth_uint8 = (depth_normalized * 255).astype(np.uint8)\n", - " \n", - " # Convert to an image\n", - " depth_img = Image.fromarray(depth_uint8)\n", - " \n", - " # Save as PNG file\n", - " img_name = os.path.splitext(npy_file)[0] + '.png'\n", - " img_save_path = os.path.join(output_img_dir, img_name)\n", - " depth_img.save(img_save_path)\n", - " \n", - " print(f'Saved {img_save_path}')\n", - "\n", - "print(\"Conversion completed!\")\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "dust3r", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/datasets_preprocess/prepare_bonn.py b/datasets_preprocess/prepare_bonn.py new file mode 100644 index 0000000..3c53834 --- /dev/null +++ b/datasets_preprocess/prepare_bonn.py @@ -0,0 +1,40 @@ +# %% +import glob +import os +import shutil +dirs = glob.glob("../data/bonn/rgbd_bonn_dataset/*/") +dirs = sorted(dirs) +# extract frames +for dir in dirs: + frames = glob.glob(dir + 'rgb/*.png') + frames = sorted(frames) + # sample 110 frames at the stride of 2 + frames = frames[30:140] + # cut frames after 110 + new_dir = dir + 'rgb_110/' + + for frame in frames: + os.makedirs(new_dir, exist_ok=True) + shutil.copy(frame, new_dir) + # print(f'cp {frame} {new_dir}') + + depth_frames = glob.glob(dir + 'depth/*.png') + depth_frames = sorted(depth_frames) + # sample 110 frames at the stride of 2 + depth_frames = depth_frames[30:140] + # cut frames after 110 + new_dir = dir + 'depth_110/' + + for frame in depth_frames: + os.makedirs(new_dir, exist_ok=True) + shutil.copy(frame, new_dir) + # print(f'cp {frame} {new_dir}') +import numpy as np +for dir in dirs: + gt_path = "groundtruth.txt" + gt = np.loadtxt(dir + gt_path) + gt_110 = gt[30:140] + np.savetxt(dir + 'groundtruth_110.txt', gt_110) + + + diff --git a/datasets_preprocess/prepare_kitti.py b/datasets_preprocess/prepare_kitti.py new file mode 100644 index 0000000..bd0eda1 --- /dev/null +++ b/datasets_preprocess/prepare_kitti.py @@ -0,0 +1,46 @@ +# %% +#!/usr/bin/python + +from PIL import Image +import numpy as np + + +def depth_read(filename): + # loads depth map D from png file + # and returns it as a numpy array, + # for details see readme.txt + + depth_png = np.array(Image.open(filename), dtype=int) + # make sure we have a proper 16bit depth map here.. not 8bit! + assert(np.max(depth_png) > 255) + + depth = depth_png.astype(np.float) / 256. + depth[depth_png == 0] = -1. + return depth + +# %% +import glob +import os +import shutil +depth_dirs = glob.glob("../data/kitti/val/*/proj_depth/groundtruth/image_02") +for dir in depth_dirs: + # new depth dir + new_depth_dir = "../data/kitti/depth_selection/val_selection_cropped/groundtruth_depth_gathered/" + dir.split("/")[-4]+"_02" + # print(new_depth_dir) + new_image_dir = "../data/kitti/depth_selection/val_selection_cropped/image_gathered/" + dir.split("/")[-4]+"_02" + os.makedirs(new_depth_dir, exist_ok=True) + os.makedirs(new_image_dir, exist_ok=True) + for depth_file in sorted(glob.glob(dir + "/*.png"))[:110]: #../data/kitti/val/2011_09_26_drive_0002_sync/proj_depth/groundtruth/image_02/0000000005.png + new_path = new_depth_dir + "/" + depth_file.split("/")[-1] + shutil.copy(depth_file, new_path) + # get the path of the corresponding image + image_file = depth_file.replace('val','raw').replace('proj_depth/groundtruth/image_02', 'image_02/data') + print(image_file) + # check if the image file exists + if os.path.exists(image_file): + new_path = new_image_dir + "/" + image_file.split("/")[-1] + shutil.copy(image_file, new_path) + else: + print("Image file does not exist: ", image_file) + + diff --git a/datasets_preprocess/prepare_nyuv2.py b/datasets_preprocess/prepare_nyuv2.py new file mode 100644 index 0000000..87fbac8 --- /dev/null +++ b/datasets_preprocess/prepare_nyuv2.py @@ -0,0 +1,84 @@ +# %% +import h5py +import numpy as np +import os +from glob import glob +from PIL import Image + +# Set the path to your dataset directory +dataset_dir = '../data/nyu-v2/val/official/' + +# Get a list of all .h5 files in the dataset directory +file_paths = glob(os.path.join(dataset_dir, '*.h5')) + +# Create output directories for images and depth data +output_image_dir = '../data/nyu-v2/val/nyu_images/' +output_depth_dir = '../data/nyu-v2/val/nyu_depths/' +os.makedirs(output_image_dir, exist_ok=True) +os.makedirs(output_depth_dir, exist_ok=True) + +for file_path in file_paths: + with h5py.File(file_path, 'r') as h5file: + # Read depth and rgb data + depth_data = h5file['depth'][:] + rgb_data = h5file['rgb'][:] + + # Convert rgb data from (3, H, W) to (H, W, 3) + rgb_data = np.transpose(rgb_data, (1, 2, 0)) + + # Ensure that rgb_data is of type uint8 + if rgb_data.dtype != np.uint8: + rgb_data = rgb_data.astype(np.uint8) + + # Get the base filename without extension + base_name = os.path.splitext(os.path.basename(file_path))[0] + + # Save the RGB image as PNG + rgb_image = Image.fromarray(rgb_data) + rgb_image.save(os.path.join(output_image_dir, f'{base_name}.png')) + + # Save the depth data as NPY file + np.save(os.path.join(output_depth_dir, f'{base_name}.npy'), depth_data) + + print(f'Processed {base_name}') + + +# %% +import os +import numpy as np +from PIL import Image + +# Paths +depth_npy_dir = '../data/nyu-v2/val/nyu_depths' +output_img_dir = '../data/nyu-v2/val/nyu_depth_imgs' + +# Ensure the output directory exists +os.makedirs(output_img_dir, exist_ok=True) + +# Iterate over all .npy files in the depth directory +for npy_file in os.listdir(depth_npy_dir): + if npy_file.endswith('.npy'): + # Load depth data from .npy file + depth_path = os.path.join(depth_npy_dir, npy_file) + depth_data = np.load(depth_path) + + # Normalize depth data to range [0, 255] for saving as an image + depth_min = depth_data.min() + depth_max = depth_data.max() + depth_normalized = (depth_data - depth_min) / (depth_max - depth_min) + depth_uint8 = (depth_normalized * 255).astype(np.uint8) + + # Convert to an image + depth_img = Image.fromarray(depth_uint8) + + # Save as PNG file + img_name = os.path.splitext(npy_file)[0] + '.png' + img_save_path = os.path.join(output_img_dir, img_name) + depth_img.save(img_save_path) + + print(f'Saved {img_save_path}') + +print("Conversion completed!") + + + diff --git a/datasets_preprocess/prepare_scannet.py b/datasets_preprocess/prepare_scannet.py new file mode 100644 index 0000000..35fd8fb --- /dev/null +++ b/datasets_preprocess/prepare_scannet.py @@ -0,0 +1,33 @@ +import glob +import os +import shutil +import numpy as np + +seq_list = sorted(os.listdir("data/scannetv2")) +for seq in seq_list: + img_pathes = sorted(glob.glob(f"data/scannetv2/{seq}/color/*.jpg"), key=lambda x: int(os.path.basename(x).split('.')[0])) + depth_pathes = sorted(glob.glob(f"data/scannetv2/{seq}/depth/*.png"), key=lambda x: int(os.path.basename(x).split('.')[0])) + pose_pathes = sorted(glob.glob(f"data/scannetv2/{seq}/pose/*.txt"), key=lambda x: int(os.path.basename(x).split('.')[0])) + print(f"{seq}: {len(img_pathes)} {len(depth_pathes)}") + + new_color_dir = f"data/scannetv2/{seq}/color_90" + new_depth_dir = f"data/scannetv2/{seq}/depth_90" + + new_img_pathes = img_pathes[:90*3:3] + new_depth_pathes = depth_pathes[:90*3:3] + new_pose_pathes = pose_pathes[:90*3:3] + + os.makedirs(new_color_dir, exist_ok=True) + os.makedirs(new_depth_dir, exist_ok=True) + + for i, (img_path, depth_path) in enumerate(zip(new_img_pathes, new_depth_pathes)): + shutil.copy(img_path, f"{new_color_dir}/frame_{i:04d}.jpg") + shutil.copy(depth_path, f"{new_depth_dir}/frame_{i:04d}.png") + + pose_new_path = f"data/scannetv2/{seq}/pose_90.txt" + with open(pose_new_path, 'w') as f: + for i, pose_path in enumerate(new_pose_pathes): + with open(pose_path, 'r') as pose_file: + pose = np.loadtxt(pose_file) + pose = pose.reshape(-1) + f.write(f"{' '.join(map(str, pose))}\n") diff --git a/datasets_preprocess/prepare_tum.py b/datasets_preprocess/prepare_tum.py new file mode 100644 index 0000000..b55f9f1 --- /dev/null +++ b/datasets_preprocess/prepare_tum.py @@ -0,0 +1,42 @@ +# %% +import glob +import os +import shutil +import numpy as np + +dirs = glob.glob("../data/tum/*/") +dirs = sorted(dirs) +# extract frames +for dir in dirs: + frames = glob.glob(dir + 'rgb/*.png') + frames = sorted(frames) + # sample 90 frames at the stride of 3 + frames = frames[::3][:90] + # cut frames after 90 + new_dir = dir + 'rgb_90/' + + for frame in frames: + os.makedirs(new_dir, exist_ok=True) + shutil.copy(frame, new_dir) + # print(f'cp {frame} {new_dir}') + + depth_frames = glob.glob(dir + 'depth/*.png') + depth_frames = sorted(depth_frames) + # sample 90 frames at the stride of 3 + depth_frames = depth_frames[::3][:90] + # cut frames after 90 + new_dir = dir + 'depth_90/' + + for frame in depth_frames: + os.makedirs(new_dir, exist_ok=True) + shutil.copy(frame, new_dir) + # print(f'cp {frame} {new_dir}') + +for dir in dirs: + gt_path = "groundtruth.txt" + gt = np.loadtxt(dir + gt_path) + gt_90 = gt[::3][:90] + np.savetxt(dir + 'groundtruth_90.txt', gt_90) + + + diff --git a/datasets_preprocess/tum.ipynb b/datasets_preprocess/tum.ipynb deleted file mode 100644 index 4ca3eeb..0000000 --- a/datasets_preprocess/tum.ipynb +++ /dev/null @@ -1,71 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "import glob\n", - "import os\n", - "import shutil\n", - "import numpy as np\n", - "\n", - "dirs = glob.glob(\"../data/tum/*/\")\n", - "dirs = sorted(dirs)\n", - "# extract frames\n", - "for dir in dirs:\n", - " frames = glob.glob(dir + 'rgb/*.png')\n", - " frames = sorted(frames)\n", - " # sample 90 frames at the stride of 3\n", - " frames = frames[::3][:90]\n", - " # cut frames after 110\n", - " new_dir = dir + 'rgb_90/'\n", - "\n", - " for frame in frames:\n", - " os.makedirs(new_dir, exist_ok=True)\n", - " shutil.copy(frame, new_dir)\n", - " # print(f'cp {frame} {new_dir}')\n", - "\n", - " depth_frames = glob.glob(dir + 'depth/*.png')\n", - " depth_frames = sorted(depth_frames)\n", - " # sample 90 frames at the stride of 2\n", - " depth_frames = depth_frames[::3][:90]\n", - " # cut frames after 90\n", - " new_dir = dir + 'depth_90/'\n", - "\n", - " for frame in depth_frames:\n", - " os.makedirs(new_dir, exist_ok=True)\n", - " shutil.copy(frame, new_dir)\n", - " # print(f'cp {frame} {new_dir}')\n", - "\n", - "for dir in dirs:\n", - " gt_path = \"groundtruth.txt\"\n", - " gt = np.loadtxt(dir + gt_path)\n", - " gt_90 = gt[::2][:90]\n", - " np.savetxt(dir + 'groundtruth_90.txt', gt_90)\n" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "dust3r", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.11.9" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/depth_metric.ipynb b/depth_metric.ipynb new file mode 100644 index 0000000..85c8edf --- /dev/null +++ b/depth_metric.ipynb @@ -0,0 +1,304 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# eval the depth of sintel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'\n", + "from dust3r.depth_eval import depth_evaluation, group_by_directory\n", + "import numpy as np\n", + "import cv2\n", + "from tqdm import tqdm\n", + "import glob\n", + "TAG_FLOAT = 202021.25\n", + "\n", + "def depth_read(filename):\n", + " \"\"\" Read depth data from file, return as numpy array. \"\"\"\n", + " f = open(filename,'rb')\n", + " check = np.fromfile(f,dtype=np.float32,count=1)[0]\n", + " assert check == TAG_FLOAT, ' depth_read:: Wrong tag in flow file (should be: {0}, is: {1}). Big-endian machine? '.format(TAG_FLOAT,check)\n", + " width = np.fromfile(f,dtype=np.int32,count=1)[0]\n", + " height = np.fromfile(f,dtype=np.int32,count=1)[0]\n", + " size = width*height\n", + " assert width > 0 and height > 0 and size > 1 and size < 100000000, ' depth_read:: Wrong input size (width = {0}, height = {1}).'.format(width,height)\n", + " depth = np.fromfile(f,dtype=np.float32,count=-1).reshape((height,width))\n", + " return depth\n", + "\n", + "pred_pathes = glob.glob(\"results/sintel_video_depth/*/*.npy\") #TODO: update the path to your prediction\n", + "pred_pathes = sorted(pred_pathes)\n", + "print(len(pred_pathes))\n", + "\n", + "if len(pred_pathes) > 643:\n", + " full = True\n", + "else:\n", + " full = False\n", + "\n", + "if full:\n", + " depth_pathes = glob.glob(f\"data/sintel/training/depth/*/*.dpt\")\n", + " depth_pathes = sorted(depth_pathes)\n", + "else:\n", + " seq_list = [\"alley_2\", \"ambush_4\", \"ambush_5\", \"ambush_6\", \"cave_2\", \"cave_4\", \"market_2\", \n", + " \"market_5\", \"market_6\", \"shaman_3\", \"sleeping_1\", \"sleeping_2\", \"temple_2\", \"temple_3\"]\n", + " depth_pathes_folder = [f\"data/sintel/training/depth/{seq}\" for seq in seq_list]\n", + " depth_pathes = []\n", + " for depth_pathes_folder_i in depth_pathes_folder:\n", + " depth_pathes += glob.glob(depth_pathes_folder_i + '/*.dpt')\n", + " depth_pathes = sorted(depth_pathes)\n", + "\n", + "\n", + "def get_video_results():\n", + " grouped_pred_depth = group_by_directory(pred_pathes)\n", + " grouped_gt_depth = group_by_directory(depth_pathes)\n", + " gathered_depth_metrics = []\n", + "\n", + " for key in tqdm(grouped_pred_depth.keys()):\n", + " pd_pathes = grouped_pred_depth[key]\n", + " gt_pathes = grouped_gt_depth[key.replace('_pred_depth', '')]\n", + "\n", + " gt_depth = np.stack([depth_read(gt_path) for gt_path in gt_pathes], axis=0)\n", + " pr_depth = np.stack([cv2.resize(np.load(pd_path), (gt_depth.shape[2], gt_depth.shape[1]), interpolation=cv2.INTER_CUBIC)\n", + " for pd_path in pd_pathes], axis=0)\n", + " depth_results, error_map, depth_predict, depth_gt = depth_evaluation(pr_depth, gt_depth, max_depth=70, align_with_lad2=False, use_gpu=True, post_clip_max=70)\n", + " gathered_depth_metrics.append(depth_results)\n", + "\n", + " depth_log_path = 'tmp.json'\n", + " average_metrics = {\n", + " key: np.average(\n", + " [metrics[key] for metrics in gathered_depth_metrics], \n", + " weights=[metrics['valid_pixels'] for metrics in gathered_depth_metrics]\n", + " )\n", + " for key in gathered_depth_metrics[0].keys() if key != 'valid_pixels'\n", + " }\n", + " print('Average depth evaluation metrics:', average_metrics)\n", + " \n", + "get_video_results()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# eval the depth of bonn" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'\n", + "from dust3r.depth_eval import depth_evaluation, group_by_directory\n", + "import numpy as np\n", + "import cv2\n", + "import json\n", + "from tqdm import tqdm\n", + "import glob\n", + "from PIL import Image\n", + "\n", + "\n", + "def depth_read(filename):\n", + " # loads depth map D from png file\n", + " # and returns it as a numpy array\n", + " depth_png = np.asarray(Image.open(filename))\n", + " # make sure we have a proper 16bit depth map here.. not 8bit!\n", + " assert np.max(depth_png) > 255\n", + " depth = depth_png.astype(np.float64) / 5000.0\n", + " depth[depth_png == 0] = -1.0\n", + " return depth\n", + "\n", + "seq_list = [\"balloon2\", \"crowd2\", \"crowd3\", \"person_tracking2\", \"synchronous\"]\n", + "\n", + "img_pathes_folder = [f\"data/bonn/rgbd_bonn_dataset/rgbd_bonn_{seq}/rgb_110/*.png\" for seq in seq_list]\n", + "img_pathes = []\n", + "for img_pathes_folder_i in img_pathes_folder:\n", + " img_pathes += glob.glob(img_pathes_folder_i)\n", + "img_pathes = sorted(img_pathes)\n", + "depth_pathes_folder = [f\"data/bonn/rgbd_bonn_dataset/rgbd_bonn_{seq}/depth_110/*.png\" for seq in seq_list]\n", + "depth_pathes = []\n", + "for depth_pathes_folder_i in depth_pathes_folder:\n", + " depth_pathes += glob.glob(depth_pathes_folder_i)\n", + "depth_pathes = sorted(depth_pathes)\n", + "pred_pathes = glob.glob(\"results/bonn_video_depth/*/*.npy\") #TODO: update the path to your prediction\n", + "pred_pathes = sorted(pred_pathes)\n", + "\n", + "def get_video_results():\n", + " grouped_pred_depth = group_by_directory(pred_pathes)\n", + " grouped_gt_depth = group_by_directory(depth_pathes, idx=-2)\n", + " gathered_depth_metrics = []\n", + " print(grouped_gt_depth.keys())\n", + " print(grouped_pred_depth.keys())\n", + " for key in tqdm(grouped_gt_depth.keys()):\n", + " pd_pathes = grouped_pred_depth[key]\n", + " gt_pathes = grouped_gt_depth[key]\n", + " gt_depth = np.stack([depth_read(gt_path) for gt_path in gt_pathes], axis=0)\n", + " pr_depth = np.stack([cv2.resize(np.load(pd_path), (gt_depth.shape[2], gt_depth.shape[1]), interpolation=cv2.INTER_CUBIC)\n", + " for pd_path in pd_pathes], axis=0)\n", + " depth_results, error_map, depth_predict, depth_gt = depth_evaluation(pr_depth, gt_depth, max_depth=70, align_with_lstsq=False, align_with_scale=True, use_gpu=True, disp_input=True)\n", + "\n", + " gathered_depth_metrics.append(depth_results)\n", + "\n", + " depth_log_path = 'tmp.json'\n", + " average_metrics = {\n", + " key: np.average(\n", + " [metrics[key] for metrics in gathered_depth_metrics], \n", + " weights=[metrics['valid_pixels'] for metrics in gathered_depth_metrics]\n", + " )\n", + " for key in gathered_depth_metrics[0].keys() if key != 'valid_pixels'\n", + " }\n", + " print('Average depth evaluation metrics:', average_metrics)\n", + " with open(depth_log_path, 'w') as f:\n", + " f.write(json.dumps(average_metrics))\n", + "\n", + "get_video_results()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# eval the depth of kitti" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "import os\n", + "os.environ['CUDA_VISIBLE_DEVICES'] = '0'\n", + "from dust3r.depth_eval import depth_evaluation, group_by_directory\n", + "import numpy as np\n", + "import cv2\n", + "import json\n", + "from tqdm import tqdm\n", + "import glob\n", + "from PIL import Image\n", + "import matplotlib.pyplot as plt\n", + "\n", + "\n", + "def depth_read(filename):\n", + " # loads depth map D from png file\n", + " # and returns it as a numpy array,\n", + " # for details see readme.txt\n", + " img_pil = Image.open(filename)\n", + " depth_png = np.array(img_pil, dtype=int)\n", + " # make sure we have a proper 16bit depth map here.. not 8bit!\n", + " assert(np.max(depth_png) > 255)\n", + "\n", + " depth = depth_png.astype(float) / 256.\n", + " depth[depth_png == 0] = -1.\n", + " return depth\n", + "\n", + "depth_pathes = glob.glob(\"data/kitti/depth_selection/val_selection_cropped/groundtruth_depth_gathered/*/*.png\")\n", + "depth_pathes = sorted(depth_pathes)\n", + "pred_pathes = glob.glob(\"results/kitti_video_depth/*/frame_*.npy\") #TODO: update the path to your prediction\n", + "pred_pathes = sorted(pred_pathes)\n", + "\n", + "\n", + "def get_video_results():\n", + " grouped_pred_depth = group_by_directory(pred_pathes)\n", + " grouped_gt_depth = group_by_directory(depth_pathes)\n", + " gathered_depth_metrics = []\n", + " for key in tqdm(grouped_pred_depth.keys()):\n", + " pd_pathes = grouped_pred_depth[key]\n", + " gt_pathes = grouped_gt_depth[key]\n", + " gt_depth = np.stack([depth_read(gt_path) for gt_path in gt_pathes], axis=0)\n", + " pr_depth = np.stack([cv2.resize(np.load(pd_path), (gt_depth.shape[2], gt_depth.shape[1]), interpolation=cv2.INTER_CUBIC)\n", + " for pd_path in pd_pathes], axis=0)\n", + " \n", + " depth_results, error_map, depth_predict, depth_gt = depth_evaluation(pr_depth, gt_depth, max_depth=None, align_with_lad2=True, use_gpu=True)\n", + "\n", + " gathered_depth_metrics.append(depth_results)\n", + "\n", + " depth_log_path = 'tmp.json'\n", + " average_metrics = {\n", + " key: np.average(\n", + " [metrics[key] for metrics in gathered_depth_metrics], \n", + " weights=[metrics['valid_pixels'] for metrics in gathered_depth_metrics]\n", + " )\n", + " for key in gathered_depth_metrics[0].keys() if key != 'valid_pixels'\n", + " }\n", + " print('Average depth evaluation metrics:', average_metrics)\n", + " with open(depth_log_path, 'w') as f:\n", + " f.write(json.dumps(average_metrics))\n", + "\n", + "get_video_results()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# eval the depth of nyu-v2" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline\n", + "from dust3r.depth_eval import depth_evaluation\n", + "import numpy as np\n", + "import json\n", + "from tqdm import tqdm\n", + "import glob\n", + "\n", + "depth_pathes = glob.glob(\"data/nyu-v2/val/nyu_depths/*.npy\")\n", + "depth_pathes = sorted(depth_pathes)\n", + "pred_pathes = glob.glob(\"results/nyuv2_depth/*.npy\") #TODO: update the path to your prediction\n", + "pred_pathes = sorted(pred_pathes)\n", + "gathered_depth_metrics = []\n", + "for idx in tqdm(range(len(depth_pathes))):\n", + " pred_depth = np.load(pred_pathes[idx])\n", + " gt_depth = np.load(depth_pathes[idx])\n", + " pred_depth = cv2.resize(pred_depth, (gt_depth.shape[1], gt_depth.shape[0]), interpolation=cv2.INTER_CUBIC)\n", + "\n", + " depth_results, error_map, depth_predict, depth_gt = depth_evaluation(pred_depth, gt_depth, max_depth=None, lr=1e-3)\n", + " gathered_depth_metrics.append(depth_results)\n", + "\n", + "depth_log_path = 'tmp.json'\n", + "average_metrics = {\n", + " key: np.average(\n", + " [metrics[key] for metrics in gathered_depth_metrics], \n", + " weights=[metrics['valid_pixels'] for metrics in gathered_depth_metrics]\n", + " )\n", + " for key in gathered_depth_metrics[0].keys() if key != 'valid_pixels'\n", + "}\n", + "print('Average depth evaluation metrics:', average_metrics)\n", + "with open(depth_log_path, 'w') as f:\n", + " f.write(json.dumps(average_metrics))\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "dust3r", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}