scConcept

This repository contains the python package to train and use scConcept (Single-cell contrastive cell pre-training) method for single-cell transcriptomics.

Installation

You need to have Python 3.10 or newer installed on your system. If you don't have Python installed, we recommend installing uv.

scConcept also uses Flash Attention which requires CUDA

Install the latest development version:

pip install git+https://github.com/theislab/scConcept.git@main

Flash Attention (required) - CUDA is required for installing flash-attn:

pip install flash-attn==2.7.* --no-build-isolation

Install lamin-dataloader from GitHub (required):

pip install git+https://github.com/theislab/lamin_dataloader.git

How to use

scConcept provides a simple API to load and adapt pre-trained models and extract embeddings from scRNA-seq data. Here's a basic example:

from concept.scConcept import scConcept
import scanpy as sc

# Load your single-cell data
adata = sc.read_h5ad("your_data.h5ad")

# Initialize scConcept and load a pretrained model
concept = scConcept(cache_dir='./cache/')
concept.load_config_and_model(model_name='Corpus-30M')

# Extract embeddings --> adata.var['gene_id']: ENSGXXXXXXXXXXX
result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')

# Use embeddings for downstream analysis
adata.obsm['X_scConcept'] = result['cls_cell_emb']

# Adapt a pre-trained model on your own data
concept.train(adata, max_steps=10000, batch_size=128) 

# Important: For multiple datasets pass them separately
concept.train([adata1, adata2, ...], max_steps=20000, batch_size=128) 

result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')
adata.obsm['X_scConcept_adapted'] = result['cls_cell_emb']

For more detailed example, see the notebook example.

Troubleshooting

If you encounter an error when loading a pre-trained model, try the following:

Remove the repository and clone the most recent version
Remove the cache directory (cache/ by default)
Run again

This will force a fresh download of the pre-trained model and should resolve most loading issues.

Citation

Bahrami, M., Tejada-Lapuerta, A., Becker, S., Hashemi G, F.S. and Theis, F.J., 2025. scConcept: Contrastive pretraining for technology-agnostic single-cell representations beyond reconstruction. bioRxiv, pp.2025-10. doi: https://doi.org/10.1101/2025.10.14.682419

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
scripts		scripts
src/concept		src/concept
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
biome.jsonc		biome.jsonc
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

scConcept

Installation

How to use

Troubleshooting

Citation

About

Uh oh!

Releases

Packages

Languages

License

theislab/scConcept

Folders and files

Latest commit

History

Repository files navigation

scConcept

Installation

How to use

Troubleshooting

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages