DynPerturb：Dynamic Perturbation Modeling for Spatiotemporal Single-Cell Systems

DynPerturb Overview

DynPerturb is an advanced deep learning model designed to infer gene regulatory networks (GRNs) and analyze the effects of perturbations on cellular states using single-cell RNA-seq data. By incorporating both temporal and spatial information, DynPerturb enhances the understanding of gene interactions during cellular development, disease progression, and response to perturbations, making it an invaluable tool for biologists and researchers in drug discovery, genetic studies, and disease modeling.

Data

All training data and model parameters used in this study are available at https://bgipan.genomics.cn/#/link/t2YuR3VHmS0Jaozwqlvk (access code: p2Qv).

Benchmark Data

Benchmark gene pairs for mESC and hESC datasets are avaliable fromhttps://github.com/xiaoyeye/TDL.

Tasks Data

Adult Human Kidney Single-Cell RNA-seq (Version 1.5)

Source: CellxGene Single-Cell Data
This dataset includes single-cell gene expression profiles from different cell types of the human kidney.

Human Bone Marrow Hematopoietic Development (Balanced Reference Map)

Source: CellxGene Bone Marrow Data
The dataset helps explore the differentiation process of blood cells from human bone marrow.

Murine Cardiac Development Spatiotemporal Transcriptome Sequencing

Source: Gigascience Article
Provides a detailed spatial transcriptomic map of murine heart development, useful for understanding heart tissue differentiation and development.

Software Environment

Component	Version
Operating System	Kylin Linux Advanced Server V10 (Sword)
Python	3.10.16
CUDA	12.2
NVIDIA Driver	535.104.12
Core Dependencies	Refer to `requirements.txt`

Hardware Requirements

Hardware Item	Specification/Model
CPU Architecture	aarch64 (ARM)
CPU Model	HiSilicon Kunpeng-920
Total RAM	256 GB (266414208 kB)
GPU	NVIDIA A100-PCIE-40GB ($\times$ 4)

Requirements

The Python dependencies for this project are listed in the requirements.txt file.

Download the requirements.txt file from this repository.
Create a conda environment (if you don’t have one already):
```
conda create --name DynPerturb python=3.10
```
Activate the conda environment:
```
conda activate DynPerturb
```
Install the dependencies by running the following command:
```
pip install -r requirements.txt
```

This will install all the Python dependencies needed for the project.

**Installation Time：**Estimated installation time is approximately 45 minutes.

Task1 : Temporal perturbation mapping reveals regulatory fragility in CKD tubule cells

Training Command

This script is used to train a self-supervised model for link prediction in graph-based data. The training process is designed to handle large-scale datasets and support distributed training using PyTorch's DistributedDataParallel (DDP).

python train_main_link.py -d aPT-B --use_memory --memory_updater rnn  --message_function mlp > log.log 2>&1

--use_memory: Enabling this option allows the model to incorporate memory augmentation for nodes during training. This can enhance the model's ability to remember historical interactions or patterns in the data, which is particularly useful for temporal graph models.
--memory_updater rnn: This argument specifies the type of memory update mechanism to use. The rnn option uses a Recurrent Neural Network (RNN) to update and manage node memory over time, making it suitable for tasks that require temporal memory updates.
--message_function mlp: The mlp option sets the message function used to process information between nodes. Specifically, it utilizes a Multi-Layer Perceptron (MLP) to aggregate and transform messages exchanged between nodes during the computation, allowing the model to learn complex relationships between nodes.

Perturbation and Extraction of Node Features

This script is designed to perform perturbation and extraction of node features in a link prediction task. Specifically, it involves the process of generating embeddings for nodes, extracting their features over time, and saving these embeddings for future use.

python train_ChangeNodeFeat_SaveEmbeddings_link.py --data HumanBone --bs 64 --n_epoch 100 --n_layer 1

Parameters:

--data: Dataset name, e.g., "HumanBone".
--bs: Batch size for training.
--n_epoch: Number of epochs.
--n_layer: Number of network layers.
--lr: Learning rate.

**Runtime：**The total computational runtime for training and extracting embeddings using 9 clusters is 184 hours (160 hours for training plus 24 hours for embedding extraction).

Expected Results:

Node Temporal Embeddings File- embeddings_.json : The file contains a chronologically-ordered series of high-dimensional state vectors. Each vector documents the state of a specific biological entity (such as a gene or cell type) at a distinct point in time during the process of hematopoietic development in the bone marrow.
Execution Log -train.log : This is a text file for logging and debugging.

Task2 : Lineage-specific transcription factor perturbations shape hematopoietic trajectories

Training Command

This script is used for self-supervised node classification training with distributed data parallelism (DDP) using PyTorch. The code supports multi-GPU and multi-node training environments to scale efficiently.

python train_main_ddp.py -d HumanBone --memory_dim 1000  --use_memory --numClasses > log.log 2>&1

--memory_dim: Sets the dimension of the memory space for the model. The memory_dim controls how much memory each node will hold, which can influence model performance.
--use_memory: This flag enables the use of node memory augmentation, which helps the model retain and utilize information from previous steps or nodes. This is particularly helpful for tasks requiring historical context.
--num_classes:

This argument specifies the number of classes for node classification tasks. It defines the total number of distinct categories or labels each node can be classified into during training. This parameter is essential for multi-class classification tasks, where the model predicts the class label for each node in the graph.

Perturbation and Extraction of Node Features

This script is designed for distributed inference on a temporal graph, focusing on perturbation and extraction of node features (i.e. embeddings), using a pretrained model. It computes temporal node embeddings across time and saves them for downstream tasks such as analysis or visualization.

python train_ChangeNodeFeat_SaveEmbeddings_ddp.py --data HumanBone --bs 64 --n_epoch 100 --n_layer 1

Parameters:

--data: The dataset name, for example, "HumanBone".
--bs: Batch size used during training.
--n_epoch: Number of epochs to train the model.****
--n_layer: Number of layers in the neural network.
--lr: Learning rate for optimization.

**Runtime：**The total computational runtime for training and extracting embeddings is 13 hours (8 hours for training plus 5 hours for embedding extraction).

Expected Results:

Node Temporal Embeddings File- embeddings_.json : The file contains a chronologically-ordered series of high-dimensional state vectors, which quantify the impact of lineage-specific transcription factor perturbations on hematopoietic trajectories by documenting the state of each biological entity (e.g., a transcription factor or cell type) at distinct points in developmental time.
Execution Log -train.log : This is a text file for logging and debugging.

Task3 : Spatially resolved perturbations uncover region-specific regulation in cardiac development

Training Command

This script is used for self-supervised node classification training with distributed data parallelism (DDP) using PyTorch. The code supports multi-GPU and multi-node training environments to scale efficiently.

python train_main_ddp.py -d mouse --memory_dim 1000  --use_memory  > log.log 2>&1

--memory_dim: Sets the dimension of the memory space for the model. The memory_dim controls how much memory each node will hold, which can influence model performance.
--use_memory: This flag enables the use of node memory augmentation, which helps the model retain and utilize information from previous steps or nodes. This is particularly helpful for tasks requiring historical context.
--num_classes:

This argument specifies the number of classes for node classification tasks. It defines the total number of distinct categories or labels each node can be classified into during training. This parameter is essential for multi-class classification tasks, where the model predicts the class label for each node in the graph.

Perturbation and Extraction of Node Features

This script is designed for distributed inference on a temporal graph, focusing on perturbation and extraction of node features (i.e. embeddings), using a pretrained model. It computes temporal node embeddings across time and saves them for downstream tasks such as analysis or visualization.

python train_ChangeNodeFeat_SaveEmbeddings_ddp.py --data mouse --bs 64 --n_epoch 100 --n_layer 1

Parameters:

--data: The dataset name, for example, "HumanBone".
--bs: Batch size used during training.
--n_epoch: Number of epochs to train the model.
--n_layer: Number of layers in the neural network.
--lr: Learning rate for optimization.

**Runtime：**The total computational runtime for training and extracting embeddings is 6 hours (4 hours for training plus 2 hours for embedding extraction).

Expected Results:

(Node Spatiotemporal Embeddings File- embeddings_.json : The file provides a quantitative, spatiotemporal atlas delineating cardiac development at the molecular level.
Execution Log -train.log : This is a text file for logging and debugging.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.idea		.idea
data processing		data processing
evaluation		evaluation
model		model
modules		modules
utils		utils
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
model.png		model.png
requirement.txt		requirement.txt
train_ChangeNodeFeat_SaveEmbeddings_ddp.py		train_ChangeNodeFeat_SaveEmbeddings_ddp.py
train_ChangeNodeFeat_SaveEmbeddings_link.py		train_ChangeNodeFeat_SaveEmbeddings_link.py
train_main_ddp.py		train_main_ddp.py
train_main_link.py		train_main_link.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DynPerturb：Dynamic Perturbation Modeling for Spatiotemporal Single-Cell Systems

DynPerturb Overview

Data

Benchmark Data

Tasks Data

Software Environment

Hardware Requirements

Requirements

Task1 : Temporal perturbation mapping reveals regulatory fragility in CKD tubule cells

Task2 : Lineage-specific transcription factor perturbations shape hematopoietic trajectories

Task3 : Spatially resolved perturbations uncover region-specific regulation in cardiac development

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

BGIResearch/DynPerturb

Folders and files

Latest commit

History

Repository files navigation

DynPerturb：Dynamic Perturbation Modeling for Spatiotemporal Single-Cell Systems

DynPerturb Overview

Data

Benchmark Data

Tasks Data

Software Environment

Hardware Requirements

Requirements

Task1 : Temporal perturbation mapping reveals regulatory fragility in CKD tubule cells

Task2 : Lineage-specific transcription factor perturbations shape hematopoietic trajectories

Task3 : Spatially resolved perturbations uncover region-specific regulation in cardiac development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages