This is the official implementation of the paper Detail-Preserving Latent Diffusion for Stable Shadow Removal.
We propose a two-stage fine-tuning pipeline to transform a pre-trained Stable Diffusion model into an image-conditional shadow-free image generator. This approach enables robust, high-resolution shadow removal without an input shadow mask.We introduce a shadow-aware detail injection module that utilizes the VAE encoder features to modulate the pre-trained VAE decoders, selectively aligning per-pixel details from the input image with those in the output shadow-free image.
For more details, please refer to our original paper.
- Python 3.10
- CUDA 11.7
cd StableShadowRemoval
pip install -e .
cd examples/text_to_image/
pip install -r requirements.txtAnd initialize an Accelerate environment with:
accelerate configOr for a default accelerate configuration without answering questions about your environment:
accelerate config defaultPlease download the corresponding pretrained model and modify the unet_path and vae_path in examples/text_to_image/inference.py.
You can directly test the performance of the pre-trained model as follows
- Modify the paths to dataset and pre-trained model. You need to modify the following path in the
examples/text_to_image/inference.py:
unet_path # pretrained stage-one unet weight path -- Line 18
vae_path # pretrained stage-two dim weight path --Line 19
image_folder # input data path -- Line 21
result_dir #result output path --Line 23- Test the model
python inference.py- Download datasets and set the following structure, and modify the dataset path in the
examples/text_to_image/my_dataset.py.
|-- ISTD+_Dataset
|-- train
|-- origin # shadow image
|-- shadow_free # shadow-free image GT
|-- train.json # text_file
|-- test
|-- origin # shadow image
|-- shadow_free # shadow-free image GT
|-- test.json # text_file
text_filepath # text_file path
image_dir # shadow-free image GT path
condition_image_dir # shadow image pathtext_filecan be generated byexamples/text_to_image/json_generate.py, setis_stage_1=True.
- The training file is
examples/text_to_image/train_text_to_image.py, Use the following command to train and set optional parameters:
./train.shCUDA_VISIBLE_DEVICES="0,1" # Select GPU
num_processes=2 # Set the number of GPUs
mixed_precision="fp16"
learning_rate=3e-05 # Correct setting (paper reports slightly different value)
pretrained_model_name_or_path # pretrained stable diffusion path
train_data_dir # dataset split file path
prediction_type # Set to sample, diffusion predict the image latent instead of noise- Use the model trained in the stage-one to generate the latent of the shadow-free image and set optional parameters:
python inference.py unet_path='trained in the stage-one'
vae_path=stabilityai/stable-diffusion-2
output_type=latent #Line 42- Set the dataset to the following structure, and modify the dataset path in the
examples/text_to_image/my_vae_dataset.py:
|-- ISTD+_Dataset
|-- train
|-- origin # shadow image
|-- shadow_free # shadow-free image GT
|-- latents_sample # predicted shadow-free latent
|-- train_vae.json # text_file
|-- test
|-- origin # shadow image
|-- shadow_free # shadow-free GT
|-- latents_sample # predicted shadow-free latent
|-- test_vae.json # text_file
text_filepath # text_file path
image_dir # shadow-free image GT path
condition_image_dir # shadow image path
latent_dir # predicted shadow-free latent pathtext_filecan be generated byexamples/text_to_image/json_generate.py, setis_stage_1=False.
- The training file is
examples/text_to_image/train_vae_decoder.py, Use the following command to train and set optional parameters:
./train_vae.shlearning_rate=5e-05 # Correct setting (paper reports slightly different value)
add_cfw=true #add detail injection model
add_dino=true #add dino featureSince the image latent for the stage-two training needs to be generated in advance, it is necessary to first perform data augmentation on the image and then generate the latent.
Downscale the input images to W/k × H/k for training, with k = 3 for the WSRD+ dataset.
Use stage-one model to generate the latent of the downscaled image, while the VAE encoder input the original-size image.
set train_vae.sh optional parameters:
super_reshape=true
super_reshape_k=3 #set reshape k- Use the stage-one model to generate the latent of the downscaled image:
python inference.py- Generate the final result by combining the latent of the downscaled image with the original-size image:
python inference_vae.pyThe results reported in the paper are calculated by the matlab script used in previous method. Details refer to evaluation/measure_shadow.m.
| Datasets | PSNR | SSIM |
|---|---|---|
| ISTD+ | 35.19 | 0.974 |
| SRD | 33.63 | 0.968 |
| INS | 30.56 | 0.975 |
| WSRD+ | 26.26 | 0.827 |
The testing results on dataset ISTD+, SRD, INS and WSRD+ are: results.
Our implementation is based on Diffusers. We would like to thank them.
Bibtex:
@Inproceedings{xu_2025_CVPR
title={Detail-Preserving Latent Diffusion for Stable Shadow Removal},
author={Xu, Jiamin and Zheng, Yuxin and Li, Zelong and Wang, Chi and Gu, Renshu and Xu, Weiwei and Xu, Gang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
If you have any questions, please contact 2451773098@qq.com.
