DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
Qingcheng Zhao*,1,† · Xiang Zhang*,✉,2 · Haiyang Xu2 · Zeyuan Chen2 · Jianwen Xie3 · Yuan Gao4 · Zhuowen Tu2
1ShanghaiTech University · 2UC San Diego · 3Lambda, Inc. · 4Stanford University
ICCV 2025
* equal contribution ✉ corresponding author
† Project done while Qingcheng Zhao interned at UC San Diego.
We provide a pre-built Docker image at zx1239856/depr based on PyTorch 2.7.1
and CUDA 12.6
. You can also build the image locally:
docker build -f Dockerfile . -t depr
Alternatively, you can install dependencies based on commands listed in Dockerfile.
Please download processed 3D-FRONT dataset from https://huggingface.co/datasets/zx1239856/DepR-3D-FRONT. Extract the downloaded files into datasets/front3d_pifu/data
. The result folder structure should look like
data/
|-- metadata/ (Scene metadata)
| |-- 0.jsonl
| |-- ...
|-- pickled_data/ (Raw data processed by InstPIFu)
| |-- test/
| |-- rendertask3000.pkl
| |-- ...
|-- sdf_layout/ (GT layouts)
| |-- 10000.npy
| |-- ...
|- 3D-FUTURE-watertight/ (GT meshes, required for evaluation)
| |-- 0004ae9a-1d27-4dbd-8416-879e9de1de8d/
| |-- raw_watertight.obj
| |-- ...
|-- instpifu_mask/ (Instance masks provided by InstPIFu)
|-- panoptic/ (Panoptic segmentation maps we rendered)
|-- img/ (Optional, can be extracted from pickled data)
|-- depth/depth_pro/ (Optional)
`-- grounded_sam/ (Optional)
Alternatively, you may generate depth / segmentation yourself based on instructions below.
Generate Segmentation
Please prepare Grounded SAM weights in checkpoint/grounded_sam
.
grounded_sam/
|-- GroundingDINO_SwinB.py
|-- groundingdino_swinb_cogcoor.pth
|-- groundingdino_swint_ogc.pth
`-- sam_vit_h_4b8939.pth
python -m scripts.run_grounded_sam
Generate Depth
Please put Depth Pro weights in checkpoint/
.
python -m scripts.run_depth_pro --output depth_pro
Please download our weights from https://huggingface.co/zx1239856/DepR and put everything in the checkpoint
folder.
We provide a demo.ipynb notebook for inference demo on real-world images.
Object-level Evaluation
You may change 8 to the actual number of GPUs as needed.
bash launch.sh 8 all
(Optional) Guided Sampling
bash launch.sh 8 all --guided
Scene-level Evaluation
# Generate shapes
bash launch.sh 8 sample --metadata datasets/front3d_pifu/meta/test_scene.jsonl --use-sam
# Layout optim
bash launch.sh 8 scene --use-sam
# Prepare GT scene
python -m scripts.build_gt --out-dir output/gt
# Calculate scene-level CD/F1
accelerate launch --num_processes=8 --multi_gpu -m scripts.eval_scene --gt-pcd-dir output/gt/pcds --pred-dir output/infer/sam_3dproj_attn_dino_c9_augdep_augmask_nocfg_model_0074999/ --save-dir output/evaluation/results --method depr
This repository is released under the CC-BY-SA 4.0 license.
Our framework utilizes pre-trained models including Grounded-Segment-Anything, Depth Pro, and DINO v2.
Our code is built upon diffusers, Uni-3D, and BlockFusion.
We use physically based renderings of 3D-FRONT scenes provided by InstPIFu. Additionally, we rendered panoptic segmentation maps ourselves.
We thank all these authors for their nicely open sourced code/datasets and their great contributions to the community.
If you find our work useful, please consider citing:
@misc{zhao2025deprdepthguidedsingleview,
title={DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion},
author={Qingcheng Zhao and Xiang Zhang and Haiyang Xu and Zeyuan Chen and Jianwen Xie and Yuan Gao and Zhuowen Tu},
year={2025},
eprint={2507.22825},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2507.22825},
}