Skip to content

Commit c7e6a56

Browse files
semjon00affromero
andcommitted
Support for Marigold
Co-authored-by: Andres Romero <me@afromero.co>
1 parent e8ff097 commit c7e6a56

12 files changed

+96
-30
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
__pycache__/
22
venv/
3+
.idea/

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
## Changelog
2+
### 0.4.5
3+
* Support for [Marigold](https://marigoldmonodepth.github.io). [PR #385](https://github.com/thygate/stable-diffusion-webui-depthmap-script/pull/385).
24
### 0.4.4
35
* Compatibility with stable-diffusion-webui 1.6.0
46
### 0.4.3 video processing tab

README.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# High Resolution Depth Maps for Stable Diffusion WebUI
22
This program is an addon for [AUTOMATIC1111's Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui) that creates depth maps. Using either generated or custom depth maps, it can also create 3D stereo image pairs (side-by-side or anaglyph), normalmaps and 3D meshes. The outputs of the script can be viewed directly or used as an asset for a 3D engine. Please see [wiki](https://github.com/thygate/stable-diffusion-webui-depthmap-script/wiki/Viewing-Results) to learn more. The program has integration with [Rembg](https://github.com/danielgatis/rembg). It also supports batch processing, processing of videos, and can also be run in standalone mode, without Stable Diffusion WebUI.
33

4-
To generate realistic depth maps from individual images, this script uses code and models from the [MiDaS](https://github.com/isl-org/MiDaS) and [ZoeDepth](https://github.com/isl-org/ZoeDepth) repositories by Intel ISL, or LeReS from the [AdelaiDepth](https://github.com/aim-uofa/AdelaiDepth) repository by Advanced Intelligent Machines. Multi-resolution merging as implemented by [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) is used to generate high resolution depth maps.
4+
To generate realistic depth maps from individual images, this script uses code and models from the [Marigold](https://github.com/prs-eth/Marigold/) repository, from the [MiDaS](https://github.com/isl-org/MiDaS) and [ZoeDepth](https://github.com/isl-org/ZoeDepth) repositories by Intel ISL, or LeReS from the [AdelaiDepth](https://github.com/aim-uofa/AdelaiDepth) repository by Advanced Intelligent Machines. Multi-resolution merging as implemented by [BoostingMonocularDepth](https://github.com/compphoto/BoostingMonocularDepth) is used to generate high resolution depth maps.
55

66
Stereoscopic images are created using a custom-written algorithm.
77

@@ -198,3 +198,16 @@ ZoeDepth :
198198
copyright = {arXiv.org perpetual, non-exclusive license}
199199
}
200200
```
201+
202+
Marigold - Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation:
203+
204+
```
205+
@misc{ke2023repurposing,
206+
title={Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation},
207+
author={Bingxin Ke and Anton Obukhov and Shengyu Huang and Nando Metzger and Rodrigo Caye Daudt and Konrad Schindler},
208+
year={2023},
209+
eprint={2312.02145},
210+
archivePrefix={arXiv},
211+
primaryClass={cs.CV}
212+
}
213+
```

bundled_sources.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,6 @@ https://github.com/aim-uofa/AdelaiDepth/tree/main/LeReS/Minist_Test/lib/
1717

1818
pix2pix
1919
https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/
20+
21+
Marigold
22+
https://github.com/prs-eth/Marigold/tree/22437a

install.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ def ensure(module_name, min_version=None):
3838
launch.run_pip('install "moviepy==1.0.2"', "moviepy requirement for depthmap script")
3939
ensure('transforms3d', '0.4.1')
4040

41+
ensure('diffusers', '0.20.1') # For Merigold
42+
4143
ensure('imageio') # 2.4.1
4244
try: # Dirty hack to not reinstall every time
4345
importlib_metadata.version('imageio-ffmpeg')

requirements.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,6 @@ transforms3d>=0.4.1
1616
imageio>=2.4.1,<3.0
1717
imageio-ffmpeg
1818
networkx>=2.5
19+
diffusers>=0.20.1 # For Marigold
1920
pyqt5; sys_platform == 'windows'
2021
pyqt6; sys_platform != 'windows'

scripts/depthmap.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,11 +85,16 @@ def add_option(name, default_value, description, name_prefix='depthmap_script'):
8585
shared.opts.add_option(f"{name_prefix}_{name}", shared.OptionInfo(default_value, description, section=section))
8686

8787
add_option('keepmodels', False, "Do not unload depth and pix2pix models.")
88+
8889
add_option('boost_rmax', 1600, "Maximum wholesize for boost (Rmax)")
90+
add_option('marigold_ensembles', 5, "How many ensembles to use for Marigold")
91+
add_option('marigold_steps', 10, "How many denoising steps to use for Marigold")
92+
8993
add_option('save_ply', False, "Save additional PLY file with 3D inpainted mesh.")
9094
add_option('show_3d', True, "Enable showing 3D Meshes in output tab. (Experimental)")
9195
add_option('show_3d_inpaint', True, "Also show 3D Inpainted Mesh in 3D Mesh output tab. (Experimental)")
9296
add_option('mesh_maxsize', 2048, "Max size for generating simple mesh.")
97+
9398
add_option('gen_heatmap_from_ui', False, "Show an option to generate HeatMap in the UI")
9499
add_option('extra_stereomodes', False, "Enable more possible outputs for stereoimage generation")
95100

src/backbone.py

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
import pathlib
55
from datetime import datetime
66
import enum
7+
import sys
78

89

910
class BackboneType(enum.Enum):
@@ -34,12 +35,13 @@ def get_cmd_opt(name, default):
3435

3536
def gather_ops():
3637
"""Parameters for depthmap generation"""
37-
from modules.shared import cmd_opts
3838
ops = {}
39-
if get_opt('depthmap_script_boost_rmax', None) is not None:
40-
ops['boost_whole_size_threshold'] = get_opt('depthmap_script_boost_rmax', None)
41-
ops['precision'] = cmd_opts.precision
42-
ops['no_half'] = cmd_opts.no_half
39+
for s in ['boost_rmax', 'precision', 'no_half', 'marigold_ensembles', 'marigold_steps']:
40+
c = get_opt('depthmap_script_' + s, None)
41+
if c is None:
42+
c = get_cmd_opt(s, None)
43+
if c is not None:
44+
ops[s] = c
4345
return ops
4446

4547

@@ -117,7 +119,12 @@ def get_opt(name, default): return default # Configuring is not supported
117119

118120
def get_cmd_opt(name, default): return default # Configuring is not supported
119121

120-
def gather_ops(): return {} # Configuring is not supported
122+
def gather_ops(): # Configuring is not supported
123+
return {'boost_rmax': 1600,
124+
'precision': 'autocast',
125+
'no_half': False,
126+
'marigold_ensembles': 5,
127+
'marigold_steps': 12}
121128

122129
def get_outpath(): return str(pathlib.Path('.', 'outputs'))
123130

src/common_ui.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,8 @@ def main_ui_panel(is_depth_tab):
3737
'dpt_beit_large_384 (midas 3.1)', 'dpt_large_384 (midas 3.0)',
3838
'dpt_hybrid_384 (midas 3.0)',
3939
'midas_v21', 'midas_v21_small',
40-
'zoedepth_n (indoor)', 'zoedepth_k (outdoor)', 'zoedepth_nk'],
40+
'zoedepth_n (indoor)', 'zoedepth_k (outdoor)', 'zoedepth_nk',
41+
'Marigold v1'],
4142
type="index")
4243
with gr.Box() as cur_option_root:
4344
inp -= 'depthmap_gen_row_1', cur_option_root

src/depthmap_generation.py

Lines changed: 51 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
from lib.multi_depth_model_woauxi import RelDepthModel
2222
from lib.net_tools import strip_prefix_if_present
2323
from pix2pix.models.pix2pix4depth_model import Pix2Pix4DepthModel
24+
# Marigold
25+
from marigold.marigold import MarigoldPipeline
2426
# pix2pix/merge net imports
2527
from pix2pix.options.test_options import TestOptions
2628

@@ -42,18 +44,11 @@ def __init__(self):
4244
self.resize_mode = None
4345
self.normalization = None
4446

45-
# Settings (initialized to sensible values, should be updated)
46-
self.boost_whole_size_threshold = 1600 # R_max from the paper by default
47-
self.no_half = False
48-
self.precision = "autocast"
4947

50-
def update_settings(self, boost_whole_size_threshold=None, no_half=None, precision=None):
51-
if boost_whole_size_threshold is not None:
52-
self.boost_whole_size_threshold = boost_whole_size_threshold
53-
if no_half is not None:
54-
self.no_half = no_half
55-
if precision is not None:
56-
self.precision = precision
48+
def update_settings(self, **kvargs):
49+
# Opens the pandora box
50+
for k, v in kvargs.items():
51+
setattr(self, k, v)
5752

5853

5954
def ensure_models(self, model_type, device: torch.device, boost: bool):
@@ -71,9 +66,11 @@ def load_models(self, model_type, device: torch.device, boost: bool):
7166
"""Ensure that the depth model is loaded"""
7267

7368
# model path and name
69+
# ZoeDepth and Marigold do not use this
7470
model_dir = "./models/midas"
7571
if model_type == 0:
7672
model_dir = "./models/leres"
73+
7774
# create paths to model if not present
7875
os.makedirs(model_dir, exist_ok=True)
7976
os.makedirs('./models/pix2pix', exist_ok=True)
@@ -194,12 +191,26 @@ def load_models(self, model_type, device: torch.device, boost: bool):
194191
conf = get_config("zoedepth_nk", "infer")
195192
model = build_model(conf)
196193

197-
model.eval() # prepare for evaluation
194+
elif model_type == 10: # Marigold v1
195+
model_path = "Bingxin/Marigold"
196+
print(model_path)
197+
dtype = torch.float32 if self.no_half else torch.float16
198+
model = MarigoldPipeline.from_pretrained(model_path, torch_dtype=dtype)
199+
try:
200+
import xformers
201+
model.enable_xformers_memory_efficient_attention()
202+
except:
203+
pass # run without xformers
204+
205+
if model_type in range(0, 10):
206+
model.eval() # prepare for evaluation
198207
# optimize
199-
if device == torch.device("cuda") and model_type in [0, 1, 2, 3, 4, 5, 6]:
200-
model = model.to(memory_format=torch.channels_last) # TODO: weird
201-
if not self.no_half and model_type != 0 and not boost: # TODO: zoedepth, too?
202-
model = model.half()
208+
if device == torch.device("cuda"):
209+
if model_type in [0, 1, 2, 3, 4, 5, 6]:
210+
model = model.to(memory_format=torch.channels_last) # TODO: weird
211+
if not self.no_half:
212+
if model_type in [1, 2, 3, 4, 5, 6] and not boost: # TODO: zoedepth, too?
213+
model = model.half()
203214
model.to(device) # to correct device
204215

205216
self.depth_model = model
@@ -238,7 +249,8 @@ def get_default_net_size(model_type):
238249
6: [256, 256],
239250
7: [384, 512],
240251
8: [384, 768],
241-
9: [384, 512]
252+
9: [384, 512],
253+
10: [768, 768]
242254
}
243255
if model_type in sizes:
244256
return sizes[model_type]
@@ -288,14 +300,17 @@ def get_raw_prediction(self, input, net_width, net_height):
288300
raw_prediction = estimateleres(img, self.depth_model, net_width, net_height)
289301
elif self.depth_model_type in [7, 8, 9]:
290302
raw_prediction = estimatezoedepth(input, self.depth_model, net_width, net_height)
291-
else:
303+
elif self.depth_model_type in [1, 2, 3, 4, 5, 6]:
292304
raw_prediction = estimatemidas(img, self.depth_model, net_width, net_height,
293305
self.resize_mode, self.normalization, self.no_half,
294306
self.precision == "autocast")
307+
elif self.depth_model_type == 10:
308+
raw_prediction = estimatemarigold(img, self.depth_model, net_width, net_height,
309+
self.marigold_ensembles, self.marigold_steps)
295310
else:
296311
raw_prediction = estimateboost(img, self.depth_model, self.depth_model_type, self.pix2pix_model,
297-
self.boost_whole_size_threshold)
298-
raw_prediction_invert = self.depth_model_type in [0, 7, 8, 9]
312+
self.boost_rmax)
313+
raw_prediction_invert = self.depth_model_type in [0, 7, 8, 9, 10]
299314
return raw_prediction, raw_prediction_invert
300315

301316

@@ -395,6 +410,19 @@ def estimatemidas(img, model, w, h, resize_mode, normalization, no_half, precisi
395410
return prediction
396411

397412

413+
# TODO: correct values for BOOST
414+
# TODO: "h" is not used
415+
def estimatemarigold(image, model, w, h, marigold_ensembles=5, marigold_steps=12):
416+
# This hideous thing should be re-implemented once there is support from the upstream.
417+
img = cv2.cvtColor((image * 255.0001).astype('uint8'), cv2.COLOR_BGR2RGB)
418+
img = Image.fromarray(img)
419+
with torch.no_grad():
420+
pipe_out = model(img, processing_res=w, show_progress_bar=False,
421+
ensemble_size=marigold_ensembles, denoising_steps=marigold_steps,
422+
match_input_res=False)
423+
return cv2.resize(pipe_out.depth_np, (image.shape[:2][::-1]), interpolation=cv2.INTER_CUBIC)
424+
425+
398426
class ImageandPatchs:
399427
def __init__(self, root_dir, name, patchsinfo, rgb_image, scale=1):
400428
self.root_dir = root_dir
@@ -616,7 +644,7 @@ def estimateboost(img, model, model_type, pix2pixmodel, whole_size_threshold):
616644
elif model_type == 1: # dpt_beit_large_512
617645
net_receptive_field_size = 512
618646
patch_netsize = 2 * net_receptive_field_size
619-
else: # other midas
647+
else: # other midas # TODO Marigold support
620648
net_receptive_field_size = 384
621649
patch_netsize = 2 * net_receptive_field_size
622650

@@ -886,6 +914,8 @@ def doubleestimate(img, size1, size2, pix2pixsize, model, net_type, pix2pixmodel
886914
def singleestimate(img, msize, model, net_type):
887915
if net_type == 0:
888916
return estimateleres(img, model, msize, msize)
917+
elif net_type == 10:
918+
return estimatemarigold(img, model, msize, msize)
889919
elif net_type >= 7:
890920
# np to PIL
891921
return estimatezoedepth(Image.fromarray(np.uint8(img * 255)).convert('RGB'), model, msize, msize)

0 commit comments

Comments
 (0)