Diffusion Models with ConvNeXt (FCDM)
_{Official PyTorch Implementation}

This repository contains the PyTorch implementation (training, sampling, and model definitions) for FCDM, a paper exploring diffusion models with the ConvNeXt architecture. It supports both local and distributed (cluster) environments.

[CVPR 2026] Reviving ConvNeXt for Efficient Convolutional Diffusion Models
Taesung Kwon, Lorenzo Bianchi, Lennart Wittke, Felix Watine, Fabio Carrara, Jong Chul Ye, Romann M. Weber, Vinicius C. Azevedo
KAIST, ETH Zürich, ISTI-CNR, University of Pisa

Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7× and 7.5× fewer training steps at 256×256 and 512×512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.

🛠 1. Installation & Setup

1.1. Environment Setup

First, create and activate the Conda environment:

conda env create -f environment.yml
conda activate fcdm

1.2. (Optional) Manual VAE Download for Offline Clusters

💡 Note: Some cluster environments restrict outgoing internet connections due to security policies, preventing automatic model downloads from Hugging Face. If you are in such an environment, follow these steps to manually download and transfer the SD-VAE (or EQVAE) snapshot.

✅ Click to see manual download & transfer steps

1. Download the snapshot locally: Run the script below on your local machine. This creates a folder (e.g., sd-vae-ft-ema/) containing the necessary model files (config.json, pytorch_model.bin, etc.).

python preparation/download_snapshot.py

2. Transfer the files to your cluster: Upload the downloaded folder to your remote cluster.

scp -r sd-vae-ft-ema/ user@your_cluster_address:/path/to/your_cluster_dir/

3. Specify the snapshot path: When launching any training or sampling scripts, append the --hf-model-dir argument to explicitly point to your transferred snapshot directory. (default is None).

python your_script.py --hf-model-dir /path/to/your_cluster_dir/sd-vae-ft-ema

🗂 2. Data Preparation

2.1. Extract Latent Representations

Extract the encoded features and corresponding labels from your dataset. These paths will be provided as --feature-path and --label-path during training.

torchrun --nnodes=1 --nproc_per_node=4  preparation/extract_features.py \
    --data-path /path/to/imagenet \
    --features-path /path/to/latents

2.2. (Optional) Cache Latents to Zarr Format

If your cluster imposes a file number limit, we recommend storing the latents in Zarr format. First, manually set the feature, label, and output paths in the script, then run:

python preparation/save_zarr.py

Once cached, you can pass the Zarr path to any script using: --zarr-path /path/to/cache.zarr

🚀 3. Training

Run the following command to train FCDM on your precomputed latents:

accelerate launch --mixed_precision bf16 --num_processes 4 train_gen/train.py \
  --model FCDM-XL \
  --label-path /path/to/labels \
  --feature-path /path/to/features \
  --results-dir results

Optional Arguments: --hf-model-dir, --zarr-path

💡 Note: We found that --mixed_precision bf16 is more stable than fp16, especially when resuming training.

🎨 4. Sampling & Evaluation

4.1. Sample Images for Evaluation

Generate 50,000 samples for evaluation. This script will automatically save the generated images and a corresponding .npz file.

torchrun --nnodes=1 --nproc_per_node=4 train_gen/sample_ddp.py \
  --model FCDM-XL \
  --num-fid-samples 50000 \
  --ckpt /path/to/checkpoint \
  --sample-dir samples \
  --ddpm True \
  --cfg-scale 1.0

Optional Arguments: --hf-model-dir

4.2. Evaluation:

Before running evaluation, download the reference .npz file from ADM's TensorFlow evaluation suite and place it in the appropriate directory.

💡 Note: Evaluation requires a TensorFlow environment. We strongly recommend setting up a separate Conda environment for this to avoid potential dependency conflicts with PyTorch.

Run the evaluation:

python train_gen/evaluate.py \
  --num-fid-samples 50000 \
  --sample-dir samples \
  --sample-batch /path/to/sample/sample.npz \
  --ref-batch /path/to/ref.npz \
  --output_fid /path/to/fid_output.txt

Acknowledgments

We thank Jakob Buhmann, Farnood Salehi, and Jingwei Tang for helpful discussions. Taesung Kwon is supported by the NRF Sejong Science Fellowship.

This codebase borrows from existing diffusion repositories, most notably Meta's DiT and its improved implementation fastDiT, and OpenAI's ADM.

License

The codes are licensed under CC-BY-NC. See LICENSE.txt for details.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
evaluation		evaluation
flow_matching		flow_matching
models		models
preparation		preparation
train_gen		train_gen
train_gen_flow		train_gen_flow
visuals		visuals
LICENSE.txt		LICENSE.txt
README.md		README.md
environment.yml		environment.yml
gitignore		gitignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion Models with ConvNeXt (FCDM)
_{Official PyTorch Implementation}

🛠 1. Installation & Setup

1.1. Environment Setup

1.2. (Optional) Manual VAE Download for Offline Clusters

🗂 2. Data Preparation

2.1. Extract Latent Representations

2.2. (Optional) Cache Latents to Zarr Format

🚀 3. Training

🎨 4. Sampling & Evaluation

4.1. Sample Images for Evaluation

4.2. Evaluation:

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Diffusion Models with ConvNeXt (FCDM)Official PyTorch Implementation

🛠 1. Installation & Setup

1.1. Environment Setup

1.2. (Optional) Manual VAE Download for Offline Clusters

🗂 2. Data Preparation

2.1. Extract Latent Representations

2.2. (Optional) Cache Latents to Zarr Format

🚀 3. Training

🎨 4. Sampling & Evaluation

4.1. Sample Images for Evaluation

4.2. Evaluation:

Acknowledgments

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Diffusion Models with ConvNeXt (FCDM)
_{Official PyTorch Implementation}

Packages