Repository for the paper.
- Python 3.10
- Dependencies listed in requirements.txt
Create a conda environment:
conda env create -f environment.yml
Activate the environment:
conda activate ssl
Install the package in development mode:
cd directory_where_you_have_your_git_repos/ssl_in_scg pip install -e .
Create symlink to the storage folder for experiments:
cd directory_where_you_have_your_git_repos/ssl_in_scg ln -s folder_for_experiment_storage project_folder
Large Dataset:
For large datasets, use the store-creation notebooks in the scTab repository to create a Merlin datamodule for efficient data loading.
Small Dataset or Single Adata Object:
For small datasets or a single Adata object, a simple PyTorch dataloader suffices. Refer to our multiomics application. A minimal example for masked pre-training of a smaller adata object is available in sc_mae.
Expected output:
Running the models will generate a checkpoint file with trained model parameters, saved using PyTorch Lightning's checkpointing functionality. This file can be used for inference, further training, or reproducibility.
Expected run time:
We pre-trained on a single GPU for approximately 1-2 days and fine-tuned on a single GPU about 12-24 hours. This depends, among others, on the underlying architecture, dataset, and hyperparameters. So, convergence should be watched.
Pre-trained model checkpoints are available on Hugging Face.
Obtain the dataset from the scTab repository or write a Merlin store on your custom data. Then change DATA_DIR in paths.py to your custom dataset or keep it with the scTab dataset. After that, follow the scripts for pre-training and fine-tuning.
If you find our work useful, please cite the following paper:
Delineating the Effective Use of Self-Supervised Learning in Single-Cell Genomics
If you use the scTab data in your research, please cite the following paper:
Scaling cross-tissue single-cell annotation models
self_supervision is licensed under the MIT License.
ssl_in_scg was written by Till Richter, Mojtaba Bahrami, Yufan Xia and Felix Fischer .