Generative modeling of regulatory DNA sequences with diffusion probabilistic models.
Documentation: https://pinellolab.github.io/DNA-Diffusion
Source Code: https://github.com/pinellolab/DNA-Diffusion
DNA-Diffusion is diffusion-based model for generation of 200bp cell type-specific synthetic regulatory elements.
Our preferred package / project manager is uv. Please follow their recommended instructions for installation.
To clone the repository and install the necessary packages, run:
git clone https://github.com/pinellolab/DNA-Diffusion.git
cd DNA-Diffusion
uv sync
This will create a virtual environment in .venv
and install all dependencies listed in the pyproject.toml file. This is compatible with both CPU and GPU, but preferred operating system is Linux with a recent GPU (e.g. A100 GPU).
To train the DNA-Diffusion model, we provide a basic config file for training the diffusion model on the same subset of chromatin accessible regions from the DHS Index dataset used in our main manuscript (K562, GM12878, HepG2, hESC cell lines).
To train the model call:
uv run train.py
We also provide a base config for debugging that will use a single sequence for training. You can override the default training script to use this debugging config by calling:
uv run train.py -cn train_debug
We provide a basic config file for generating sequences using the diffusion model resulting in 1000 sequences made per cell type. Base generation utilizes a guidance scale 1.0, however this can be tuned within the sample.py with the cond_weight_to_metric
parameter. To generate sequences call:
uv run sample.py
The default setup for sampling will generate 1000 sequences per cell type. You can override the default sampling script to generate one sequence per cell type with the following cli flags:
uv run sample.py sampling.number_of_samples=1 sampling.sample_batch_size=1
Thanks goes to these wonderful people (emoji key):
Lucas Ferreira da Silva π€ π» |
Luca Pinello π€ |
Simon π€ π» |
This project follows the all-contributors specification. Contributions of any kind welcome!