StableShadowRemoval

This is the official implementation of the paper Detail-Preserving Latent Diffusion for Stable Shadow Removal.

Introduction

We propose a two-stage fine-tuning pipeline to transform a pre-trained Stable Diffusion model into an image-conditional shadow-free image generator. This approach enables robust, high-resolution shadow removal without an input shadow mask.We introduce a shadow-aware detail injection module that utilizes the VAE encoder features to modulate the pre-trained VAE decoders, selectively aligning per-pixel details from the input image with those in the output shadow-free image.

For more details, please refer to our original paper.

Requirement

Python 3.10
CUDA 11.7

cd StableShadowRemoval
pip install -e .
cd examples/text_to_image/
pip install -r requirements.txt

And initialize an Accelerate environment with:

accelerate config

Or for a default accelerate configuration without answering questions about your environment:

accelerate config default

Datasets

ISTD+ [link]
SRD [link]
INS [link]
Real Photo [link]
WSRD+ [link]

Pretrained models

ISTD+ | SRD | INS | WSRD+

Please download the corresponding pretrained model and modify the unet_path and vae_path in examples/text_to_image/inference.py.

Test

You can directly test the performance of the pre-trained model as follows

Modify the paths to dataset and pre-trained model. You need to modify the following path in the examples/text_to_image/inference.py :

unet_path  # pretrained stage-one unet weight path -- Line 18
vae_path  # pretrained stage-two dim weight path --Line 19
image_folder  # input data path -- Line 21
result_dir   #result output path --Line 23

Test the model

python inference.py

Train

Stage one

Download datasets and set the following structure, and modify the dataset path in the examples/text_to_image/my_dataset.py.

|-- ISTD+_Dataset
    |-- train
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free image GT
        |-- train.json  # text_file
    |-- test
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free image GT
        |-- test.json  # text_file

text_filepath  # text_file path
image_dir  # shadow-free image GT path
condition_image_dir  # shadow image path

text_file can be generated by examples/text_to_image/json_generate.py, set is_stage_1=True.

The training file is examples/text_to_image/train_text_to_image.py, Use the following command to train and set optional parameters:

./train.sh

CUDA_VISIBLE_DEVICES="0,1"  # Select GPU
num_processes=2  # Set the number of GPUs
mixed_precision="fp16"
learning_rate=3e-05  # Correct setting (paper reports slightly different value)
pretrained_model_name_or_path  # pretrained stable diffusion path
train_data_dir  # dataset split file path
prediction_type  # Set to sample, diffusion predict the image latent instead of noise

Stage two

Use the model trained in the stage-one to generate the latent of the shadow-free image and set optional parameters:

python inference.py

unet_path='trained in the stage-one'
vae_path=stabilityai/stable-diffusion-2
output_type=latent  #Line 42

Set the dataset to the following structure, and modify the dataset path in the examples/text_to_image/my_vae_dataset.py:

|-- ISTD+_Dataset
    |-- train
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free image GT
        |-- latents_sample  # predicted shadow-free latent
        |-- train_vae.json  # text_file
    |-- test
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free GT
        |-- latents_sample  # predicted shadow-free latent
        |-- test_vae.json  # text_file

text_filepath  # text_file path
image_dir  # shadow-free image GT path
condition_image_dir  # shadow image path
latent_dir  # predicted shadow-free latent path

text_file can be generated by examples/text_to_image/json_generate.py, set is_stage_1=False.

The training file is examples/text_to_image/train_vae_decoder.py, Use the following command to train and set optional parameters:

./train_vae.sh

learning_rate=5e-05  # Correct setting (paper reports slightly different value)
add_cfw=true  #add detail injection model
add_dino=true  #add dino feature

Since the image latent for the stage-two training needs to be generated in advance, it is necessary to first perform data augmentation on the image and then generate the latent.

Large-size inputs

Train

Stage one

Downscale the input images to W/k × H/k for training, with k = 3 for the WSRD+ dataset.

Stage two

Use stage-one model to generate the latent of the downscaled image, while the VAE encoder input the original-size image. set train_vae.sh optional parameters:

super_reshape=true
super_reshape_k=3  #set reshape k

Test

Use the stage-one model to generate the latent of the downscaled image:

python inference.py

Generate the final result by combining the latent of the downscaled image with the original-size image:

python inference_vae.py

Evaluation

The results reported in the paper are calculated by the matlab script used in previous method. Details refer to evaluation/measure_shadow.m.

Results

Evaluation on ISTD+, SRD and INS

Datasets	PSNR	SSIM
ISTD+	35.19	0.974
SRD	33.63	0.968
INS	30.56	0.975
WSRD+	26.26	0.827

Testing results

The testing results on dataset ISTD+, SRD, INS and WSRD+ are: results.

References

Our implementation is based on Diffusers. We would like to thank them.

Citation

Bibtex:

@Inproceedings{xu_2025_CVPR
title={Detail-Preserving Latent Diffusion for Stable Shadow Removal},
author={Xu, Jiamin and Zheng, Yuxin and Li, Zelong and Wang, Chi and Gu, Renshu and Xu, Weiwei and Xu, Gang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}

Contact

If you have any questions, please contact 2451773098@qq.com.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
benchmarks		benchmarks
doc		doc
docker		docker
docs		docs
examples		examples
scripts		scripts
src/diffusers		src/diffusers
tests		tests
utils		utils
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
PHILOSOPHY.md		PHILOSOPHY.md
README.md		README.md
_typos.toml		_typos.toml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StableShadowRemoval

Introduction

Requirement

Datasets

Pretrained models

Test

Train

Stage one

Stage two

Large-size inputs

Train

Stage one

Stage two

Test

Evaluation

Results

Evaluation on ISTD+, SRD and INS

Testing results

References

Citation

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Hiahia1369/StableShadowRemoval

Folders and files

Latest commit

History

Repository files navigation

StableShadowRemoval

Introduction

Requirement

Datasets

Pretrained models

Test

Train

Stage one

Stage two

Large-size inputs

Train

Stage one

Stage two

Test

Evaluation

Results

Evaluation on ISTD+, SRD and INS

Testing results

References

Citation

Contact

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages