This repository contains the python package to train and use scConcept (Single-cell contrastive cell pre-training) method for single-cell transcriptomics.
You need to have Python 3.10 or newer installed on your system. If you don't have Python installed, we recommend installing uv.
scConcept also uses Flash Attention which requires CUDA
- Install the latest development version:
pip install git+https://github.com/theislab/scConcept.git@main- Flash Attention (required) - CUDA is required for installing flash-attn:
pip install flash-attn==2.7.* --no-build-isolation- Install lamin-dataloader from GitHub (required):
pip install git+https://github.com/theislab/lamin_dataloader.gitscConcept provides a simple API to load and adapt pre-trained models and extract embeddings from scRNA-seq data. Here's a basic example:
from concept.scConcept import scConcept
import scanpy as sc
# Load your single-cell data
adata = sc.read_h5ad("your_data.h5ad")
# Initialize scConcept and load a pretrained model
concept = scConcept(cache_dir='./cache/')
concept.load_config_and_model(model_name='Corpus-30M')
# Extract embeddings --> adata.var['gene_id']: ENSGXXXXXXXXXXX
result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')
# Use embeddings for downstream analysis
adata.obsm['X_scConcept'] = result['cls_cell_emb']
# Adapt a pre-trained model on your own data
concept.train(adata, max_steps=10000, batch_size=128)
# Important: For multiple datasets pass them separately
concept.train([adata1, adata2, ...], max_steps=20000, batch_size=128)
result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')
adata.obsm['X_scConcept_adapted'] = result['cls_cell_emb']For more detailed example, see the notebook example.
If you encounter an error when loading a pre-trained model, try the following:
- Remove the repository and clone the most recent version
- Remove the cache directory (
cache/by default) - Run again
This will force a fresh download of the pre-trained model and should resolve most loading issues.
Bahrami, M., Tejada-Lapuerta, A., Becker, S., Hashemi G, F.S. and Theis, F.J., 2025. scConcept: Contrastive pretraining for technology-agnostic single-cell representations beyond reconstruction. bioRxiv, pp.2025-10. doi: https://doi.org/10.1101/2025.10.14.682419