MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model

Shuwei Shao¹ Zhongcai Pei¹ Weihai Chen¹ Dingchi Sun¹ Peter C. Y. Chen² Zhengguo Li³

¹Beihang University, ²National University of Singapore, ³A*STAR

• TCSVT 2024 •

Abstract

Over the past few years, self-supervised monocular depth estimation has received widespread attention. Most efforts focus on designing different types of network architectures and loss functions or handling edge cases, for example, occlusion and dynamic objects. In this work, we take another path and propose a novel conditional diffusion-based generative framework for self-supervised monocular depth estimation, dubbed MonoDiffusion. Because the depth ground-truth is unavailable in a self-supervised setting, we develop a new pseudo ground-truth diffusion process to assist the diffusion for training. Instead of diffusing at a fixed high resolution, we perform diffusion in a coarse-to-fine manner that allows for faster inference time without sacrificing accuracy or even better accuracy. Furthermore, we develop a simple yet effective contrastive depth reconstruction mechanism to enhance the denoising ability of model. It is worth noting that the proposed MonoDiffusion has the property of naturally acquiring the depth uncertainty that is essential to be implemented in safety-critical cases. Extensive experiments on the KITTI, Make3D and DIML datasets indicate that our MonoDiffusion outperforms prior state-of-the-art self-supervised competitors.
💾 Overview

⚙️ Setup

The experimental environments and command setup are based on Lite-Mono.

📦 Model zoo

You can download the model weights from the following link.

✏️ 📄 Citation

If you find our work useful in your research please consider citing our paper:

@article{shao2024monodiffusion, title={MonoDiffusion: self-supervised monocular depth estimation using diffusion model}, author={Shao, Shuwei and Pei, Zhongcai and Chen, Weihai and Sun, Dingchi and Chen, Peter CY and Li, Zhengguo}, journal={IEEE Transactions on Circuits and Systems for Video Technology}, year={2024}, publisher={IEEE} }

Acknowledgement

Our code is based on the implementation of Lite-Mono. We thank the authors for their excellent work and repository.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
datasets		datasets
diffusers		diffusers
linear_warmup_cosine_annealing_warm_restarts_weight_decay		linear_warmup_cosine_annealing_warm_restarts_weight_decay
networks		networks
splits		splits
LICENSE		LICENSE
README.md		README.md
evaluate_depth_diffusion.py		evaluate_depth_diffusion.py
kitti_utils.py		kitti_utils.py
layers.py		layers.py
options.py		options.py
train_df.py		train_df.py
trainer_df.py		trainer_df.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model

• TCSVT 2024 •

Abstract

💾 Overview

⚙️ Setup

📦 Model zoo

✏️ 📄 Citation

Acknowledgement

About

Releases

Packages

Languages

License

ShuweiShao/MonoDiffusion

Folders and files

Latest commit

History

Repository files navigation

MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model

• TCSVT 2024 •

Abstract

💾 Overview

⚙️ Setup

📦 Model zoo

✏️ 📄 Citation

Acknowledgement

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages