Single-Cell-Perturbations

Kaggle competition project--Single cell perturbations

Dataset

Details

selected 144 compounds(2--dabrfenib and belinostat as postive controls and DMSO as negative control) from LINCS to PBMCs from 3 donors

The plate contains 96 wells, each well contains PBMCs from a donor(each well contains cells belonging to all cell types), 72 wells--compound, 16--positive controls, 8--negative controls, The full dataset comprises 2 different compound plates per donor for a total of 6 plates and 350 cells per well

Why introduce two positive controls and negative controls? One reason is that when we cell multiplexing(pool all samples in each row into a single pool for sequencing), two positive controls and one negative control in each row of the plate is to allow us to account for this source of noise when we calculate differential expression.

there is no DE data for the DMSO sample, because it is the negative control. All DE output is calculated in reference to the DMSO, i.e. the DE analysis asks "how confident am I that each gene increased or decreased relative to DMSO due to the compound treatment".

Data splits

Training dataset: All compounds in T, NK cells and 10% of the compounds in B and Myeloid cells
Testing dataset: randomly chosen compounds in B and Myeloid cells

Main dataset

de_train.parquet

614 cells, 18211 genes(The first 5 columns are cell types/compound pair and Boolean indicator of control)
adata_train.parquet

adopt different format--COO sparse--array format, other fileds: obs_id...

Tasks Descriprion

Overview

Modelling differential expression, predict the gene expression differential data in reference to the negative controls(DMSO)

Evaluation Metric

Mean Rowwise Root Mean Squared Error(MRRMSE)

i: represent the cells, and j: represent the genes

Several methods have been developed for drug perturbation prediction, most of which are variations on the autoencoder architecture (Dr.VAE, scGEN, and ChemCPA).

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
processed_data		processed_data
10.27PreprocessData.ipynb		10.27PreprocessData.ipynb
10.27RunRegression.ipynb		10.27RunRegression.ipynb
LinearSVR+Ensembling-with-other-results.ipynb		LinearSVR+Ensembling-with-other-results.ipynb
README.md		README.md
config.py		config.py
data.py		data.py
lightgbm_wrap.py		lightgbm_wrap.py
main.ipynb		main.ipynb
mlp.py		mlp.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Single-Cell-Perturbations

Dataset

Details

Data splits

Main dataset

Tasks Descriprion

Overview

Evaluation Metric

About

Releases

Packages

Contributors 2

Languages

kAI-swa/Kaggle-SingleCellPerturbations

Folders and files

Latest commit

History

Repository files navigation

Single-Cell-Perturbations

Dataset

Details

Data splits

Main dataset

Tasks Descriprion

Overview

Evaluation Metric

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages