Skip to content
/ CC-Diff Public

Implementation of paper "CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis"

Notifications You must be signed in to change notification settings

AZZMM/CC-Diff

Repository files navigation

CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis

🔥News

Abstract

Generating visually realistic remote sensing (RS) images requires maintaining semantic coherence between objects and their surrounding environment. However, existing image synthesis methods prioritize foreground controllability while oversimplifying backgrounds into plain or generic textures. This oversight neglects the crucial interaction between foreground and background elements, resulting in semantic inconsistencies in RS scenarios. To address this challenge, we propose CC-Diff++, a Diffusion Model-based approach for spatially controllable RS image synthesis with enhanced Context Coherence. To capture spatial interdependence, we propose a novel module named Co-Resampler, which employs an advanced masked attention mechanism to jointly extract features from both the foreground and background while modeling their mutual relationships. Furthermore, we introduce a text-to-layout prediction module powered by Large Language Models (LLMs) and a reference image retrieval mechanism for providing rich textural guidance, which work together to enable CC-Diff++ to generate outputs that are both more diverse and more realistic.Extensive experiments demonstrate that CC-Diff++ outperforms state-of-the-art methods in visual fidelity, semantic accuracy, and positional precision on multiple RS datasets. CC-Diff++ also shows strong trainability, improving detection accuracy by 2.04 mAP on DOTA and 11.81 mAP on the HRSC dataset.

Overview

  • CC-Diff++ can generate RS images with enhanced context coherence.

First

  • An overview of CC-Diff++'s pipeline.

arch

  • Detailed structure of Co-Resampler and Conditional Generation Module.

Co-Resampler

cond gen

  • Gallery

DIOR-RSVG

rs

DOTA

rs

HRSC

rs

Main Results

Quantitative comparison of results on RS datasets DIOR-RSVG, DOTA and HRSC.

Method DIOR-RSVG DOTA HRSC
CLIP-L ↑ CLIP-G ↑ FID ↓ mAP50 mAP50:95 CLIP-L ↑ CLIP-G ↑ FID ↓ mAP50 mAP50:95 CLIP-L ↑ CLIP-G ↑ FID ↓ mAP50 mAP50:95
Txt2Img-MHN 18.9123.46123.840.430.16 19.5825.99137.760.070.02 17.2221.8291.270.100.02
DiffusionSat 19.8432.6878.162.870.84 19.7831.6165.190.220.05 17.3630.0983.530.130.01
GLIGEN 20.5532.2273.027.902.74 20.7229.9861.050.950.28 19.4830.5168.121.890.56
AeroGen 20.2830.7574.9043.1627.97 21.5526.1355.0241.8826.37 18.9628.2462.409.564.37
LayoutDiffusion 19.3130.6579.0341.2224.23 20.4927.6764.7742.7022.66 19.4829.3973.9511.443.84
MIGC 21.5932.3679.9346.9230.10 22.2130.9663.9557.0633.66 21.6930.9166.1327.0516.93
CC-Diff 21.8232.3670.6856.5939.97 22.6030.9247.7257.7234.04 21.2730.4864.9628.9117.31
CC-Diff++ 21.9432.5265.8656.9439.99 22.6630.9347.0258.5935.66 23.2130.9661.5230.4817.37

Trainability comparison on DIOR-RSVG, DOTA and HRSC.

Method DIOR-RSVG DOTA HRSC
mAP mAP50 mAP75 mAP mAP50 mAP75 mAP mAP50 mAP75
Baseline 50.1775.8454.38 35.5362.1035.83 34.8851.4540.90
Txt2Img-MHN 50.1275.8754.74 35.9162.5336.43 33.4549.6538.67
DiffusionSat 49.9575.5955.26 36.1562.5036.76 29.6745.6233.51
AeroGen 51.3976.8556.75 36.6563.1536.96 44.4360.3850.60
LayoutDiffusion 51.9677.3156.82 35.1561.5435.18 43.7460.2751.34
MIGC 51.8776.6557.20 35.9362.3636.32 42.1860.0149.44
CC-Diff 52.1877.3957.59 37.3663.1838.55 42.1561.1550.74
CC-Diff++ 52.6277.5158.09 37.5763.1838.05 46.6966.5855.73

Getting Started

1. Installation

To get started, first clone the CC-Diff repository:

git clone https://github.com/AZZMM/CC-Diff.git
cd CC-Diff

Environment setup:

conda create -n CC-Diff python=3.9 -y
conda activate CC-Diff
pip install -r requirement.txt

2. Data Preparation

2.1 Dataset processing

This is an example:

DIOR
├── train
│   ├── 00003.jpg
|   ├── ...
|   ├── metadata.jsonl
├── val
|   ├── 00011.jpg
|   ├── ...
|   ├── metadata.jsonl
├── results
│   ├── ...
├── dior_emb.pt

Dataset processing scripts are in data_tools, for details see data_process.md.

2.2 Checkpoint

Download the DIOR_checkpoint or DOTA_checkpoint.

3. Model Inference and Training

Controllable RS image generation:

python infer_dior.py

Train CC-Diff model:

./dist_train.sh

4. CC-Diff++

We include the implementation of the enhanced version: CC-Diff++ in ccdiff_pp folder.

Training: Just replace the script train_dior.py in ./dist_train.sh with train_dior_pp.py to train the CC-Diff++ model.

Inference:

accelerate launch 
  --main_process_port=29500,
  --num_processes=8
  infer_dior_pp.py

5.synthetic images

We here publish the synthetic images used in the trainability experiments for every dataset, enabling exact reproduction of our results.

DIOR-RSVG: link, Password: ccdp

DOTA: link, Password: ccdp

HRSC: link, Password: ccdp

Acknowledgements

Our work is based on stable diffusion, diffusers, CLIP, We appreciate their outstanding contributions.

📄 BibTex

@ARTICLE{11187367,
  author={Zhang, Mu and Liu, Yunfan and Liu, Yue and Zhao, Yuzhong and Ye, Qixiang},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={CC-Diff++: Spatially Controllable Text-to-Image Synthesis for Remote Sensing With Enhanced Contextual Coherence}, 
  year={2025},
  volume={63},
  number={},
  pages={1-16},
  keywords={Semantics;Image synthesis;Coherence;Layout;Diffusion models;Remote sensing;Feature extraction;Visualization;Roads;Training;Diffusion model;generative model;image synthesis;layout-to-image (L2I) generation;text-to-image (T2I) generation},
  doi={10.1109/TGRS.2025.3616376}}

About

Implementation of paper "CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published