A multimodal foundation model for single-cell immune profiling
TCRfoundation integrates gene expression and TCR sequences (α and β chains) from paired single-cell measurements through self-supervised pretraining with masked reconstruction and cross-modal contrastive learning.
Gene expression profiles are encoded through feed-forward layers with multi-head attention, while TCR sequences are tokenized and processed through transformer blocks. The fused representations are learned via three objectives: masked gene expression reconstruction, masked TCR sequence reconstruction, and cross-modal alignment.
The pretrained model supports three downstream applications:
- T-cell state classification: Predict tissue origin, disease state, and cellular phenotype
- Binding specificity detection: Identify TCR-antigen interactions and quantify binding avidity
- Cross-modal prediction: Infer gene expression from TCR sequences
pip install tcrfoundationgit clone https://github.com/Liao-Xu/TCRfoundation.git
cd TCRfoundation
pip install -e .Requirements: Python 3.8+, PyTorch 1.10.0+
import tcrfoundation as tcrf
import scanpy as sc
# Load your data
adata = sc.read("your_data.h5ad")
# Pretrain the foundation model
model, history = tcrf.pretrain.train(
adata,
epochs=500,
batch_size=2048,
save_dir='models/'
)
# Fine-tune for classification
results, adata_new = tcrf.finetune.classification.train_classifier(
adata,
label_column="cell_type",
checkpoint_path="models/foundation_model_best.pt",
num_epochs=50
)- Full Documentation: https://tcrfoundation.readthedocs.io
Complete Jupyter notebook tutorials are available:
- Pretraining - Train the foundation model
- Classification - T cell state classification
- Specificity - Antigen specificity prediction
- Avidity - Binding avidity regression
- Cross-modal - TCR-to-gene prediction
- Author: Xu Liao
- Email: xl3514@cumc.columbia.edu
- GitHub: https://github.com/Liao-Xu/TCRfoundation

