Read the pre-print! A deep-learning tool for species-agnostic integration of cancer cell states
scVital is a powerful tool designed for the integration and analysis of single-cell RNA sequencing (scRNA-seq) data from multiple species. It leverages deep learning techniques to integrate datasets, enabling comprehensive comparative analyses and insights into conserved and species-specific cell states.
- Cross-Species Data Integration: Seamlessly integrate scRNA-seq data from different species.
- Deep Learning Models: Utilizes autoencoders and discriminators for effective data integration.
- Comprehensive Evaluation Metrics: Evaluate model performance using latent space similarity, UMAP visualization, cluster identification, and more.
- User-Friendly Interface: Easy to use with detailed documentation and examples.
To install scVital from PyPI, use the following command:
pip install scVitalTo download and install scVital from a GitHub release, follow these steps:
-
Navigate to the GitHub Repository:
- Go to the scVital GitHub repository.
-
Go to the Releases Section:
- Click on the "Releases" tab, which is usually found on the right side of the repository's main page.
-
Find the Desired Release:
- Browse through the list of releases and find the one you want to download. Releases are typically tagged with version numbers.
-
Download the Release:
- Under the desired release, you will find assets such as
.zipor.tar.gzfiles. Click on the appropriate file to download it.
- Under the desired release, you will find assets such as
-
Install the Package:
-
Once downloaded, you can install the package using
pip. Navigate to the directory where the downloaded file is located and run the following command in your terminal:pip install path/to/downloaded/file.zip
Replace
path/to/downloaded/file.zipwith the actual path to the downloaded file.
-
Let's say you want to download and install a package from a GitHub release:
-
Navigate to the repository:
https://github.com/j-rub/scVital -
Go to the Releases section:
https://github.com/j-rub/scVital/releases -
Download the release: Click on
scVital-v1.0.1.zipto download it. -
Install the package:
pip install ~/Downloads/scVital-1.0.1.zip
import numpy as np
import pandas as pd
import scanpy as sc
import scVital as sv
writeDir = "../data"
tissue = "muscle"
species1 = "human"
adataFile1 = f'{writeDir}/{tissue}Human_DeM_QC.h5ad'
species2 = "mouse"
adataFile2 = f'{writeDir}/{tissue}Mouse_DeM_QC.h5ad'
adata1 = sc.read_h5ad(adataFile1)
adata2 = sc.read_h5ad(adataFile2)
adata = sv.mg.mergeAdatas([adata1, adata2])
batchKey = "species"
sc.pp.highly_variable_genes(adata, batch_key=batchKey, n_top_genes=2000)
adata = adata[:, np.logical_and(adata.var.highly_variable, np.logical_not(adata.var.mt))]
setupData = {
'adata': adata,
'batchLabel': 'species',
'miniBatchSize': 128,
'numEpoch': 36,
'learningRate': 1e-3,
'hid1': 1024,
'hid2': 128,
'latentSize': 12,
'discHid': 6,
'reconCoef': 1e1,
'klCoef': 5e-1,
'discCoef': 1e0,
'discIter': 5,
'earlyStop': 1e-2,
'train': False,
'seed': 18,
'verbose': True
}
scVitalModel = sv.makeScVital(setupData['adata'], setupData['batchLabel'], setupData['miniBatchSize'], setupData['numEpoch'], setupData['learningRate'],
setupData['hid1'], setupData['hid2'], setupData['latentSize'], setupData['discHid'],
setupData['reconCoef'], setupData['klCoef'], setupData['discCoef'], setupData['discIter'],
setupData['earlyStop'], setupData['train'], seed=setupData['seed'], verbose=setupData['verbose']
)
scVitalModel.runTrainScVital()
adata = scVitalModel.getAdata()
umapKey = "scVitalModel"
neighborsKey = "scVitalModel"
pcaRep = "X_scVital"
sc.pp.pca(adata, svd_solver="arpack")
sc.pp.neighbors(adata, n_pcs=scVitalModel.getLatentSize(), use_rep=pcaRep, key_added=umapKey)
sc.tl.umap(adata, neighbors_key = umapKey)
sc.tl.leiden(adata, resolution=0.1, key_added = neighborsKey, neighbors_key = umapKey)#, flavor="igraph", n_iterations=2, directed=False)
sc.pl.umap(adata, color = ["species",neighborsKey,"cell_annotation"], ncols = 2)-
Additional Discriminator:
- Address inter-patient heterogeneity to reduce batch effects when integrating human data.
- One discriminator to remove patient batch effect.
- One discriminator to remove species batch effect.
- Enhance integration when multiple human patients are involved.
- Address inter-patient heterogeneity to reduce batch effects when integrating human data.
-
Use Reconstructed Output Data as Imputed Data:
- Utilize imputed gene expression data for further downstream analysis, including differential gene expression.
- Overcome gene dropout in scRNA-seq data.
- Utilize imputed gene expression data for further downstream analysis, including differential gene expression.
-
Expand to Perform Cell Clustering:
- Add an output softmax clustering layer to the latent space.
- Identify cell states without needing Leiden clustering after training.
- Add an output softmax clustering layer to the latent space.
-
Utilize GPUs to Speed Up the Process:
- Implement GPU acceleration to enhance processing speed and efficiency.
- Possibly use NVIDIA rapids
- Implement GPU acceleration to enhance processing speed and efficiency.
README made with the help of copilot.
