Best Paper Finalist: Learning to Recover 3D Scene Shape from a Single Image
This repository contains the source code of the paper: Wei Yin, Jianming Zhang, Oliver Wang, Simon Niklaus, Long Mai, Simon Chen, Chunhua Shen Learning to Recover 3D Scene Shape from a Single Image. Published in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR) 2021.
[NEW] Training codes have been released!!
You may want to check this video which provides a very brief introduction to the work:
conda create -n LeReS python=3.7
conda activate LeReS
conda install pytorch==1.6.0 torchvision==0.7.0 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt
If you only want to test the monocular depth estimation from a single image, you can directly go to 'Quick Start' and follow Step 3. If you want to reconstruct 3D shape from a single image, please install torchsparse packages as follows. If you have any issues with torchsparse, please refer to torchsparse.
#torchsparse currently only supports PyTorch 1.6.0 + CUDA 10.2 + CUDNN 7.6.2.
sudo apt-get install libsparsehash-dev
pip install --upgrade git+https://github.com/mit-han-lab/torchsparse.git@v1.2.0
-
Download the model weights
-
Prepare data.
- Move the downloaded weights to
LeReS/Minist_Test/
- Put the testing RGB images to
LeReS/Minist_Test/test_images/
. Predicted depths and reconstructed point cloud are saved underLeReS/Minist_Test/test_images/outputs
- Move the downloaded weights to
-
Test monocular depth prediction. Note that the predicted depths are affine-invariant.
export PYTHONPATH="<PATH to Minist_Test>" # run the ResNet-50 python ./tools/test_depth.py --load_ckpt res50.pth --backbone resnet50 # run the ResNeXt-101 python ./tools/test_depth.py --load_ckpt res101.pth --backbone resnext101
-
Test 3D reconstruction from a single image.
export PYTHONPATH="<PATH to Minist_Test>" # run the ResNet-50 python ./tools/test_shape.py --load_ckpt res50.pth --backbone resnet50 # run the ResNeXt-101 python ./tools/test_shape.py --load_ckpt res101.pth --backbone resnext101
-
(Optional) Run a demo training to verify the python environment.
cd Train/scripts sh train_demo.sh
-
Download the training data. Please run 'download_data.sh' to achieve datasets of taskonomy, DiverseDepth, HWRSI and Holopix50k. All data are organized under the
Train/datasets
. The structure of all data are as follows.|--Train | |--data | |--lib | |--scripts | |--tools | |--datasets | | |--DiverseDepth | | | |--annotations | | | |--depths | | | |--rgbs | | |--taskonomy | | | |--annotations | | | |--depths | | | |--rgbs | | | |--ins_planes | | |--HRWSI | | |--Holopix50k
-
Train the network. The default setting used 4 gpus. If you want to use more gpus, please set
$CUDA_VISIBLE_DEVICES
, such asexport CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
. The--batchsize
is the number of samples on a single gpu.cd Train/scripts sh train.sh
-
Test the network on a benchmark. We provide a sample code for testing on NYU. Please download the NYU testing data test.mat for evaluation and move it to
Train/datasets/test.mat
. If you want to test on other benchmarks, you can follow the sample code.cd Train/scripts sh test.sh
@article{yin2022towards,
title={Towards Accurate Reconstruction of 3D Scene Shape from A Single Monocular Image},
author={Yin, Wei and Zhang, Jianming and Wang, Oliver and Niklaus, Simon and Chen, Simon and Liu, Yifan and Shen, Chunhua},
journal={TPAMI},
year={2022}
}
@inproceedings{Wei2021CVPR,
title = {Learning to Recover 3D Scene Shape from a Single Image},
author = {Wei Yin and Jianming Zhang and Oliver Wang and Simon Niklaus and Long Mai and Simon Chen and Chunhua Shen},
booktitle = {Proc. IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR)},
year = {2021}
}
This project is under a non-commercial license from Adobe Research. See the LICENSE file for details.