This repository contains the code for the paper: Tame a Wild Camera: In-the-Wild Monocular Camera Calibration in NeurIPS 2023.
Authors: Shengjie Zhu, Abhinav Kumar, Masa Hu, and Xiaoming Liu
[arXiv preprint] [Prject Page] [Poster]
-
4 DoF Camera Calibration (Zero-Shot)
- Camera Calibration:
zoom.mp4
- DollyZoom-Demo1:
dollyzoom1mp4.mp4
- DollyZoom-Demo2:
dollyzoom2mp4.mp4
- DollyZoom-Demo3:
dollyzoom3.mp4
-
Image Crop and Resize Detection and Restoration (Zero-Shot)
crop.mp4
-
In-the-Wild Monocular 3D Object Detection (Omni3d)
omni3D.mp4
Pretrained models and data are held in Hugging Face.
WildCamera
├── model_zoo
│ ├── Release
│ │ ├── wild_camera_all.pth
│ │ ├── wild_camera_gsv.pth
│ ├── swin_transformer
│ │ ├── swin_large_patch4_window7_224_22k.pth
├── data
│ ├── MonoCalib
│ │ ├── ARKitScenes
│ │ ├── BIWIRGBDID
│ │ ├── CAD120
│ │ ├── ...
│ │ ├── Waymo
│ ├── UncalibTwoViewPoseEvaluation
│ │ ├── megadepth_test_1500
│ │ ├── scannet_test_1500
Use the script to download data in your preferred location. Entire dataset takes around 150 GB disk space.
mkdir model_zoo
./asset/download_wildcamera_checkpoint.sh
ln -s your-data-location data
./asset/download_wildcamera_dataset.sh
conda create -n wildacamera
conda activate wildacamera
conda install pytorch==1.13.0 torchvision==0.14.0 torchaudio==0.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
pip install mmcv==2.0.0 -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.13/index.html
pip install timm tensorboard loguru einops natsort h5py tabulate
You can choose difference pytorch and cuda version, however, need to follow this link in selecting corresponded mmcv version.
# Download demo images
sh asset/download_demo_images.sh
# Estimate intrinsic over images collected from github
python demo/demo_inference.py
# Demo inference on dolly zoom videos
python demo/demo_dollyzoom.py
# Demo image restoration
python demo/demo_restoration.py
- Use torch.hub to load the model (in-the-wild experiment checkpoint):
model = torch.hub.load('ShngJZ/WildCamera', "WildCamera", pretrained=True)
- Calibrate intrinsic and restore image (if needed):
import PIL.Image as Image
rgb = Image.open(path-to-image)
# 4 DoF intrinsic
intrinsic, _ = model.inference(rgb, wtassumption=False)
# 1 DoF intrinsic
intrinsic, _ = model.inference(rgb, wtassumption=True)
# If need to restore image
rgb_restored = model.restore_image(rgb, intrinsic, fixcrop=True)
# Benchmark Tab.2 and Tab.4
python WildCamera/benchmark/benchmark_calibration.py --experiment_name in_the_wild
# Benchmark Tab.3
python WildCamera/benchmark/benchmark_calibration.py --experiment_name gsv
# Benchmark Tab.5
python WildCamera/benchmark/benchmark_crop.py
# Benchmark Tab.6
python WildCamera/benchmark/benchmark_uncalibtwoview_megadepth.py
python WildCamera/benchmark/benchmark_uncalibtwoview_scannet.py
# In-the-Wild Experiment
CUDA_VISIBLE_DEVICES=0,1 python WildCamera/train/train_calibrator.py \
--experiment_name calbr_in_the_wild \
--experiment_set in_the_wild \
--steps_per_epoch 2500
# GSV Experiment
CUDA_VISIBLE_DEVICES=0,1 python WildCamera/train/train_calibrator.py \
--experiment_name calbr_gsv \
--experiment_set gsv
Please use the following BibTeX to cite our work.
@inproceedings{zhu2023tame,
author = {Shengjie Zhu and Abhinav Kumar and Masa Hu and Xiaoming Liu},
title = {Tame a Wild Camera: In-the-Wild Monocular Camera Calibration},
booktitle = {NeurIPS},
year = {2023},
}
If you use the Tame-a-Wild-Camera benchmark, we kindly ask you to additionally cite all datasets. BibTex entries are provided below.
Dataset BibTex
@inproceedings{
dehghan2021arkitscenes,
title={{ARK}itScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile {RGB}-D Data},
author={Gilad Baruch and Zhuoyuan Chen and Afshin Dehghan and Tal Dimry and Yuri Feigin and Peter Fu and Thomas Gebauer and Brandon Joffe and Daniel Kurz and Arik Schwartz and Elad Shulman},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1)},
year={2021},
url={https://openreview.net/forum?id=tjZjv_qh_CE}
}
@inproceedings{cordts2016cityscapes,
title={The cityscapes dataset for semantic urban scene understanding},
author={Cordts, Marius and Omran, Mohamed and Ramos, Sebastian and Rehfeld, Timo and Enzweiler, Markus and Benenson, Rodrigo and Franke, Uwe and Roth, Stefan and Schiele, Bernt},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={3213--3223},
year={2016}
}
@inproceedings{geiger2012we,
title={Are we ready for autonomous driving? the kitti vision benchmark suite},
author={Geiger, Andreas and Lenz, Philip and Urtasun, Raquel},
booktitle={2012 IEEE conference on computer vision and pattern recognition},
pages={3354--3361},
year={2012},
organization={IEEE}
}
@inproceedings{li2018megadepth,
title={Megadepth: Learning single-view depth prediction from internet photos},
author={Li, Zhengqi and Snavely, Noah},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2041--2050},
year={2018}
}
@inproceedings{yu2023mvimgnet,
title={Mvimgnet: A large-scale dataset of multi-view images},
author={Yu, Xianggang and Xu, Mutian and Zhang, Yidan and Liu, Haolin and Ye, Chongjie and Wu, Yushuang and Yan, Zizheng and Zhu, Chenming and Xiong, Zhangyang and Liang, Tianyou and others},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={9150--9161},
year={2023}
}
@article{fuhrmann2014mve,
title={Mve-a multi-view reconstruction environment.},
author={Fuhrmann, Simon and Langguth, Fabian and Goesele, Michael},
journal={GCH},
volume={3},
pages={4},
year={2014}
}
@inproceedings{caesar2020nuscenes,
title={nuscenes: A multimodal dataset for autonomous driving},
author={Caesar, Holger and Bankiti, Varun and Lang, Alex H and Vora, Sourabh and Liong, Venice Erin and Xu, Qiang and Krishnan, Anush and Pan, Yu and Baldan, Giancarlo and Beijbom, Oscar},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={11621--11631},
year={2020}
}
@inproceedings{Silberman:ECCV12,
author = {Nathan Silberman, Derek Hoiem, Pushmeet Kohli and Rob Fergus},
title = {Indoor Segmentation and Support Inference from RGBD Images},
booktitle = {ECCV},
year = {2012}
}
@inproceedings{ahmadyan2021objectron,
title={Objectron: A large scale dataset of object-centric videos in the wild with pose annotations},
author={Ahmadyan, Adel and Zhang, Liangkai and Ablavatski, Artsiom and Wei, Jianing and Grundmann, Matthias},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={7822--7831},
year={2021}
}
@inproceedings{sturm2012benchmark,
title={A benchmark for the evaluation of RGB-D SLAM systems},
author={Sturm, J{\"u}rgen and Engelhard, Nikolas and Endres, Felix and Burgard, Wolfram and Cremers, Daniel},
booktitle={2012 IEEE/RSJ international conference on intelligent robots and systems},
pages={573--580},
year={2012},
organization={IEEE}
}
@inproceedings{dai2017scannet,
title={Scannet: Richly-annotated 3d reconstructions of indoor scenes},
author={Dai, Angela and Chang, Angel X and Savva, Manolis and Halber, Maciej and Funkhouser, Thomas and Nie{\ss}ner, Matthias},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={5828--5839},
year={2017}
}
@article{chang2015shapenet,
title={Shapenet: An information-rich 3d model repository},
author={Chang, Angel X and Funkhouser, Thomas and Guibas, Leonidas and Hanrahan, Pat and Huang, Qixing and Li, Zimo and Savarese, Silvio and Savva, Manolis and Song, Shuran and Su, Hao and others},
journal={arXiv preprint arXiv:1512.03012},
year={2015}
}
@inproceedings{xiao2013sun3d,
title={Sun3d: A database of big spaces reconstructed using sfm and object labels},
author={Xiao, Jianxiong and Owens, Andrew and Torralba, Antonio},
booktitle={Proceedings of the IEEE international conference on computer vision},
pages={1625--1632},
year={2013}
}
@inproceedings{sun2020scalability,
title={Scalability in perception for autonomous driving: Waymo open dataset},
author={Sun, Pei and Kretzschmar, Henrik and Dotiwalla, Xerxes and Chouard, Aurelien and Patnaik, Vijaysai and Tsui, Paul and Guo, James and Zhou, Yin and Chai, Yuning and Caine, Benjamin and others},
booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
pages={2446--2454},
year={2020}
}