The official repo of the ICLR 2025 paper, High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
We propose DiffDIS, a diffusion-based segmentation model that taps into the potential of the pre-trained U-Net within diffusion models to maintain a global receptive field while precisely learning target distributions for high-resolution, fine-grained segmentation. It addresses key challenges in conventional diffusion architectures, such as stochastic generation patterns and inefficient inference, and also tackles task-specific complexities in DIS.
- DiffDIS elegantly navigating the traditional struggle to effectively balance the trade-off between receptive field expansion and detail preservation in traditional discriminative learning-based methods.
- We transform the recurrent nature of diffusion models into an end-to-end framework by implementing straightforward one-step denoising, significantly accelerating the inference speed.
- We introduce an auxiliary edge generation task, complemented by an effective interactive module, to achieve a nuanced balance in detail representation while also enhancing the determinism of the generated masks.
- DiffDIS outperform almost all metrics on the DIS benchmark dataset, and thus establishes a new SoTA.
- Clone this repository
git clone git@github.com:qianyu-dlut/DiffDIS.git
cd DiffDIS
- Install packages
source activate base
conda create -n diffdis python==3.10
conda activate diffdis
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install -e diffusers-0.30.2/
- Download the pretrained model at sd-turbo, and update the
--pretrained_model_pathin./scripts/run_train.shwith its path. - Update the
--dataset_pathin./scripts/run_train.shwith the path of training dataset DIS5K. - Update the data path (line 7-8) in
get_contour.pyand generate edges by running:
python get_contour.py
- Then, you can start training by simply running:
cd scripts
bash run_train.sh
or you can use accelerate library with updating the --config_file in ./scripts/run_train_accelerator.sh and running:
cd scripts
bash run_train_accelerator.sh
The model will be saved in ./saved_model/DiffDIS/
-
Update the test dataset path in config file
./utils/config.py(line 3~7) -
Copy the file
/path/to/sd-turbo/unet/config.jsonand paste it under your checkpoint./saved_model/DiffDIS/Model_XXX/unet/. You can also download our trained unet at Google Drive or Huggingface. -
Update the
--checkpoint_path,--pretrained_model_pathand--output_dirin./scripts/run_inference.sh. -
Start inference by:
cd scripts
bash run_inference.sh
- Change the predicted map path in
./test_score.py(line 17) and start testing:
python test_score.py
Our code is based on
-
Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation, Github
-
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image, Github.
-
Diffusers, Github .
We are grateful to the authors of these projects for their pioneering work and contributions!
- Huggingface app demo on the way...
If you have any questions, please feel free to contact me(ms.yuqian AT mail DOT dlut DOT edu DOT cn).
If you find this code useful, we kindly ask you to cite our paper in your work.
@article{DiffDIS,
title={High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity},
author={Yu, Qian and Jiang, Peng-Tao and Zhang, Hao and Chen, Jinwei and Li, Bo and Zhang, Lihe and Lu, Huchuan},
journal={arXiv preprint arXiv:2410.10105},
year={2024}
}
