Skip to content

Hiahia1369/StableShadowRemoval

StableShadowRemoval

This is the official implementation of the paper Detail-Preserving Latent Diffusion for Stable Shadow Removal.

Introduction

We propose a two-stage fine-tuning pipeline to transform a pre-trained Stable Diffusion model into an image-conditional shadow-free image generator. This approach enables robust, high-resolution shadow removal without an input shadow mask.We introduce a shadow-aware detail injection module that utilizes the VAE encoder features to modulate the pre-trained VAE decoders, selectively aligning per-pixel details from the input image with those in the output shadow-free image.

For more details, please refer to our original paper.

Requirement

  • Python 3.10
  • CUDA 11.7
cd StableShadowRemoval
pip install -e .
cd examples/text_to_image/
pip install -r requirements.txt

And initialize an Accelerate environment with:

accelerate config

Or for a default accelerate configuration without answering questions about your environment:

accelerate config default

Datasets

Pretrained models

ISTD+ | SRD | INS | WSRD+

Please download the corresponding pretrained model and modify the unet_path and vae_path in examples/text_to_image/inference.py.

Test

You can directly test the performance of the pre-trained model as follows

  1. Modify the paths to dataset and pre-trained model. You need to modify the following path in the examples/text_to_image/inference.py :
unet_path  # pretrained stage-one unet weight path -- Line 18
vae_path  # pretrained stage-two dim weight path --Line 19
image_folder  # input data path -- Line 21
result_dir   #result output path --Line 23
  1. Test the model
python inference.py

Train

Stage one

  1. Download datasets and set the following structure, and modify the dataset path in the examples/text_to_image/my_dataset.py.
|-- ISTD+_Dataset
    |-- train
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free image GT
        |-- train.json  # text_file
    |-- test
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free image GT
        |-- test.json  # text_file
text_filepath  # text_file path
image_dir  # shadow-free image GT path
condition_image_dir  # shadow image path
  • text_file can be generated by examples/text_to_image/json_generate.py, set is_stage_1=True.
  1. The training file is examples/text_to_image/train_text_to_image.py, Use the following command to train and set optional parameters:
./train.sh
CUDA_VISIBLE_DEVICES="0,1"  # Select GPU
num_processes=2  # Set the number of GPUs
mixed_precision="fp16"
learning_rate=3e-05  # Correct setting (paper reports slightly different value)
pretrained_model_name_or_path  # pretrained stable diffusion path
train_data_dir  # dataset split file path
prediction_type  # Set to sample, diffusion predict the image latent instead of noise

Stage two

  1. Use the model trained in the stage-one to generate the latent of the shadow-free image and set optional parameters:
python inference.py 
unet_path='trained in the stage-one'
vae_path=stabilityai/stable-diffusion-2
output_type=latent  #Line 42
  1. Set the dataset to the following structure, and modify the dataset path in the examples/text_to_image/my_vae_dataset.py:
|-- ISTD+_Dataset
    |-- train
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free image GT
        |-- latents_sample  # predicted shadow-free latent
        |-- train_vae.json  # text_file
    |-- test
        |-- origin  # shadow image
        |-- shadow_free  # shadow-free GT
        |-- latents_sample  # predicted shadow-free latent
        |-- test_vae.json  # text_file
text_filepath  # text_file path
image_dir  # shadow-free image GT path
condition_image_dir  # shadow image path
latent_dir  # predicted shadow-free latent path
  • text_file can be generated by examples/text_to_image/json_generate.py, set is_stage_1=False.
  1. The training file is examples/text_to_image/train_vae_decoder.py, Use the following command to train and set optional parameters:
./train_vae.sh
learning_rate=5e-05  # Correct setting (paper reports slightly different value)
add_cfw=true  #add detail injection model
add_dino=true  #add dino feature

Since the image latent for the stage-two training needs to be generated in advance, it is necessary to first perform data augmentation on the image and then generate the latent.

Large-size inputs

Train

Stage one

Downscale the input images to W/k × H/k for training, with k = 3 for the WSRD+ dataset.

Stage two

Use stage-one model to generate the latent of the downscaled image, while the VAE encoder input the original-size image. set train_vae.sh optional parameters:

super_reshape=true
super_reshape_k=3  #set reshape k

Test

  1. Use the stage-one model to generate the latent of the downscaled image:
python inference.py
  1. Generate the final result by combining the latent of the downscaled image with the original-size image:
python inference_vae.py

Evaluation

The results reported in the paper are calculated by the matlab script used in previous method. Details refer to evaluation/measure_shadow.m.

Results

Evaluation on ISTD+, SRD and INS

Datasets PSNR SSIM
ISTD+ 35.19 0.974
SRD 33.63 0.968
INS 30.56 0.975
WSRD+ 26.26 0.827

Testing results

The testing results on dataset ISTD+, SRD, INS and WSRD+ are: results.

References

Our implementation is based on Diffusers. We would like to thank them.

Citation

Bibtex:

@Inproceedings{xu_2025_CVPR
title={Detail-Preserving Latent Diffusion for Stable Shadow Removal},
author={Xu, Jiamin and Zheng, Yuxin and Li, Zelong and Wang, Chi and Gu, Renshu and Xu, Weiwei and Xu, Gang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}

Contact

If you have any questions, please contact 2451773098@qq.com.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages