scFoundation/GEARS at main · biomap-research/scFoundation

History

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
gears		gears
modules		modules
run_sh		run_sh
Plot.ipynb		Plot.ipynb
README.md		README.md
train.py		train.py

README.md

Data

We now provide the processed datasets used in our experiments. You can download all h5ad files from Figshare https://doi.org/10.6084/m9.figshare.24049200 .

We now provide demo data in the data folder. The demo data is a subset of the Dixit dataset.

Then you need to download the go.csv.zip from https://www.dropbox.com/s/wl0uxiz5z9dbliv/go.csv.zip?dl=0 and unzip it to the data/DATASETNAME/ folder. For instance, put it into the data/demo or data/gse133344_k562gi_oe_pert227_84986_19264_withtotalcount folder.

Installation

Install PyG, and then do pip install cell-gears. Actually we do not use the gears PyPI packages but installing it will install all the dependencies.

Demo Usage

# cd to the GEARS folder
## no embedding
bash run_sh/run_singlecell_maeautobin-demo-baseline.sh

## using embedding from API call 
bash run_sh/run_singlecell_maeautobin-demo-emb.sh

## using embedding from local model
bash run_sh/run_singlecell_maeautobin-demo-train.sh

For baseline results, run run_sh/run_singlecell_maeautobin-demo-baseline.sh. For scFoundation results, choose either run_sh/run_singlecell_maeautobin-demo-emb.sh (API based) or run_sh/run_singlecell_maeautobin-demo-train.sh (local model based). We now offer two approaches for obtaining embeddings:

API-Based Embeddings: Modify gears/model.py at Line 117 to invoke the scFoundation API for embedding retrieval. Set API parameters --output_type gene_batch and --pre_normalized A. This will yield a gene context embedding NumPy array for each training batch, shaped as batch19264hidden. This array should be returned for further training.
Local Model Embeddings: For using embeddings derived from our model, use these settings in your training bash file:

model_type=maeautobin
bin_set=autobin_resolution_append
finetune_method='frozen'
singlecell_model_path=../model/models/models.ckpt

Here you can change the "finetune_method" as other type such as "finetune_lr_1" to finetune both the scFoundation and GEARS model. Due to the GPU memory limiation, we used "forzen" in our experiments.

The output text in the terminal will be redirected into the train.log file.

And the results will be saved in the results folder.

Experiment datasets

All h5ad files required for the experiments are available for download from Figshare at this link https://doi.org/10.6084/m9.figshare.24049200 . After downloading, execute the corresponding training bash scripts to start your experiments. For example, to obtain baseline results on the Norman dataset, run run_singlecell_norman.sh. For results using scFoundation embeddings on the Norman dataset, run run_singlecell_maeautobin-0.1B-res0-norman.sh. Note that upon the first execution of any training script, a folder will be automatically created based on the h5ad file.

Expected output

You will get a folder named results/DATASETNAME/0.75/xxx with the following files:

config.pkl
model.pt
params.csv
train.log

The Plot.ipynb is the jupyter notebook of reproducing the figures.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GEARS

GEARS

README.md

Data

Installation

Demo Usage

Experiment datasets

Expected output

Files

GEARS

Directory actions

More options

Directory actions

More options

Latest commit

History

GEARS

Folders and files

parent directory

README.md

Data

Installation

Demo Usage

Experiment datasets

Expected output