CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis

🔥News

2025-09-25: CC-Diff++ has been accepted by TGRS10.1109/TGRS.2025.3616376
2024-12-03: Publish initial code.
2024-12-12: Arxiv is avaliable (arxiv: 2412.08464).

Abstract

Generating visually realistic remote sensing (RS) images requires maintaining semantic coherence between objects and their surrounding environment. However, existing image synthesis methods prioritize foreground controllability while oversimplifying backgrounds into plain or generic textures. This oversight neglects the crucial interaction between foreground and background elements, resulting in semantic inconsistencies in RS scenarios. To address this challenge, we propose CC-Diff++, a Diffusion Model-based approach for spatially controllable RS image synthesis with enhanced Context Coherence. To capture spatial interdependence, we propose a novel module named Co-Resampler, which employs an advanced masked attention mechanism to jointly extract features from both the foreground and background while modeling their mutual relationships. Furthermore, we introduce a text-to-layout prediction module powered by Large Language Models (LLMs) and a reference image retrieval mechanism for providing rich textural guidance, which work together to enable CC-Diff++ to generate outputs that are both more diverse and more realistic.Extensive experiments demonstrate that CC-Diff++ outperforms state-of-the-art methods in visual fidelity, semantic accuracy, and positional precision on multiple RS datasets. CC-Diff++ also shows strong trainability, improving detection accuracy by 2.04 mAP on DOTA and 11.81 mAP on the HRSC dataset.

Overview

CC-Diff++ can generate RS images with enhanced context coherence.

An overview of CC-Diff++'s pipeline.

Detailed structure of Co-Resampler and Conditional Generation Module.

Gallery

DIOR-RSVG

DOTA

HRSC

Main Results

Quantitative comparison of results on RS datasets DIOR-RSVG, DOTA and HRSC.

Method	DIOR-RSVG					DOTA					HRSC
Method	CLIP-L ↑	CLIP-G ↑	FID ↓	mAP₅₀	mAP_50:95	CLIP-L ↑	CLIP-G ↑	FID ↓	mAP₅₀	mAP_50:95	CLIP-L ↑	CLIP-G ↑	FID ↓	mAP₅₀	mAP_50:95
Txt2Img-MHN	18.91	23.46	123.84	0.43	0.16	19.58	25.99	137.76	0.07	0.02	17.22	21.82	91.27	0.10	0.02
DiffusionSat	19.84	32.68	78.16	2.87	0.84	19.78	31.61	65.19	0.22	0.05	17.36	30.09	83.53	0.13	0.01
GLIGEN	20.55	32.22	73.02	7.90^†	2.74^†	20.72	29.98	61.05	0.95^†	0.28^†	19.48	30.51	68.12	1.89^†	0.56^†
AeroGen	20.28	30.75	74.90	43.16	27.97	21.55	26.13	55.02	41.88	26.37	18.96	28.24	62.40	9.56	4.37
LayoutDiffusion	19.31	30.65	79.03	41.22	24.23	20.49	27.67	64.77	42.70	22.66	19.48	29.39	73.95	11.44	3.84
MIGC	21.59	32.36	79.93	46.92	30.10	22.21	30.96	63.95	57.06	33.66	21.69	30.91	66.13	27.05	16.93
CC-Diff	21.82	32.36	70.68	56.59	39.97	22.60	30.92	47.72	57.72	34.04	21.27	30.48	64.96	28.91	17.31
CC-Diff++	21.94	32.52	65.86	56.94	39.99	22.66	30.93	47.02	58.59	35.66	23.21	30.96	61.52	30.48	17.37

Trainability comparison on DIOR-RSVG, DOTA and HRSC.

Method	DIOR-RSVG			DOTA			HRSC
Method	mAP	mAP₅₀	mAP₇₅	mAP	mAP₅₀	mAP₇₅	mAP	mAP₅₀	mAP₇₅
Baseline	50.17	75.84	54.38	35.53	62.10	35.83	34.88	51.45	40.90
Txt2Img-MHN	50.12	75.87	54.74	35.91	62.53	36.43	33.45	49.65	38.67
DiffusionSat	49.95	75.59	55.26	36.15	62.50	36.76	29.67	45.62	33.51
AeroGen	51.39	76.85	56.75	36.65	63.15	36.96	44.43	60.38	50.60
LayoutDiffusion	51.96	77.31	56.82	35.15	61.54	35.18	43.74	60.27	51.34
MIGC	51.87	76.65	57.20	35.93	62.36	36.32	42.18	60.01	49.44
CC-Diff	52.18	77.39	57.59	37.36	63.18	38.55	42.15	61.15	50.74
CC-Diff++	52.62	77.51	58.09	37.57	63.18	38.05	46.69	66.58	55.73

Getting Started

1. Installation

To get started, first clone the CC-Diff repository:

git clone https://github.com/AZZMM/CC-Diff.git
cd CC-Diff

Environment setup:

conda create -n CC-Diff python=3.9 -y
conda activate CC-Diff
pip install -r requirement.txt

2. Data Preparation

2.1 Dataset processing

This is an example:

DIOR
├── train
│   ├── 00003.jpg
|   ├── ...
|   ├── metadata.jsonl
├── val
|   ├── 00011.jpg
|   ├── ...
|   ├── metadata.jsonl
├── results
│   ├── ...
├── dior_emb.pt

Dataset processing scripts are in data_tools, for details see data_process.md.

2.2 Checkpoint

Download the DIOR_checkpoint or DOTA_checkpoint.

3. Model Inference and Training

Controllable RS image generation:

python infer_dior.py

Train CC-Diff model:

./dist_train.sh

4. CC-Diff++

We include the implementation of the enhanced version: CC-Diff++ in ccdiff_pp folder.

Training: Just replace the script train_dior.py in ./dist_train.sh with train_dior_pp.py to train the CC-Diff++ model.

Inference:

accelerate launch 
  --main_process_port=29500,
  --num_processes=8
  infer_dior_pp.py

5.synthetic images

We here publish the synthetic images used in the trainability experiments for every dataset, enabling exact reproduction of our results.

DIOR-RSVG: link, Password: ccdp

DOTA: link, Password: ccdp

HRSC: link, Password: ccdp

Acknowledgements

Our work is based on stable diffusion, diffusers, CLIP, We appreciate their outstanding contributions.

📄 BibTex

@ARTICLE{11187367,
  author={Zhang, Mu and Liu, Yunfan and Liu, Yue and Zhao, Yuzhong and Ye, Qixiang},
  journal={IEEE Transactions on Geoscience and Remote Sensing}, 
  title={CC-Diff++: Spatially Controllable Text-to-Image Synthesis for Remote Sensing With Enhanced Contextual Coherence}, 
  year={2025},
  volume={63},
  number={},
  pages={1-16},
  keywords={Semantics;Image synthesis;Coherence;Layout;Diffusion models;Remote sensing;Feature extraction;Visualization;Roads;Training;Diffusion model;generative model;image synthesis;layout-to-image (L2I) generation;text-to-image (T2I) generation},
  doi={10.1109/TGRS.2025.3616376}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis

🔥News

Abstract

Overview

Main Results

Quantitative comparison of results on RS datasets DIOR-RSVG, DOTA and HRSC.

Trainability comparison on DIOR-RSVG, DOTA and HRSC.

Getting Started

1. Installation

2. Data Preparation

3. Model Inference and Training

4. CC-Diff++

5.synthetic images

Acknowledgements

📄 BibTex

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
ccdiff		ccdiff
ccdiff_pp		ccdiff_pp
data_tools		data_tools
dinov2		dinov2
eval		eval
figures		figures
README.md		README.md
Rainbow-Party-2.ttf		Rainbow-Party-2.ttf
dist_train.sh		dist_train.sh
infer_dior.py		infer_dior.py
infer_dior_pp.py		infer_dior_pp.py
infer_dota.py		infer_dota.py
requirement.txt		requirement.txt
train_dior.py		train_dior.py
train_dior_pp.py		train_dior_pp.py
train_dota.py		train_dota.py

AZZMM/CC-Diff

Folders and files

Latest commit

History

Repository files navigation

CC-Diff: Enhancing Contextual Coherence in Remote Sensing Image Synthesis

🔥News

Abstract

Overview

Main Results

Quantitative comparison of results on RS datasets DIOR-RSVG, DOTA and HRSC.

Trainability comparison on DIOR-RSVG, DOTA and HRSC.

Getting Started

1. Installation

2. Data Preparation

3. Model Inference and Training

4. CC-Diff++

5.synthetic images

Acknowledgements

📄 BibTex

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages