[NeurIPS 2024 spotlight] Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model
This repository is the official implementation of the NeurIPS 2024 paper: "Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model"
If our work assists your research, feel free to give us a star ⭐ or cite us using:
@article{zhang2024text,
title={Text-DiFuse: An Interactive Multi-Modal Image Fusion Framework based on Text-modulated Diffusion Model},
author={Zhang, Hao and Cao, Lei and Ma, Jiayi},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={39552--39572},
year={2024}
url={https://proceedings.neurips.cc/paper_files/paper/2024/hash/45e409b46bebd648e9041a628a1a9964-Abstract-Conference.html}
}
If you have any questions or discussions, please send me an email:
whu.caolei@whu.edu.cn
conda create -n Text-DiFuse python==3.9
conda activate Text-DiFuse
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0
pip install -r requirements.txt
Download the public datasets MSRS, RoadScene, TNO, LLVIP, M3FD, and Harvard, and place them in the following directory:
./data/test/
You can download the pre-trained weights from Google Drive and place them in the following directory:
./pretrained/
After modifying configurable parameters such as task_type and timestep, you can directly run the code:
python test.py
If you want to test the modulation mode, please first download the pretrained model weights for OWL-ViT and SAM, and place them at the following path:
./modulated/checkpoint/
You can modify parameter text_prompt, and then run the code:
python test_modulated.py
Place your own training data in the directory:
./data/train_diffusion/
And then run the code:
python train_diffusion.py
Place your own training data in the directory:
./data/train_FCM/
And then run the code:
python train_FCM.py
