scHPL: Hierarchical progressive learning of cell identities in single-cell data
We present a hierarchical progressive learning method which automatically finds relationships between cell populations across multiple datasets and uses this to construct a hierarchical classification tree. For each node in the tree either a linear SVM, kNN, or one-class SVM, which enables the detection of unknown populations, is trained. The trained classification tree can be used to predict the labels of a new unlabeled dataset.
NOTE: scHPL is not a batch correction tool, we advise to align the datasets before matching the cell populations. We advise doing this with scVI or scArches (see section treeArches below).
scHPL requires Python 3.6 or higher. The easiest way to install scHPL is through the following command:
pip install scHPL
The `tutorial.ipynb`
notebook explains the basics of scHPL. The vignette folder contains notebooks to reproduce the inter-dataset experiments. See the documentation for more information.
treeArches is a framework around scHPL and scArches to automatically build and update reference atlases and the classification tree. Examples can be found in the treeArches reprodicibility Github and in this notebook.
All datasets used are publicly available data and can be downloaded from Zenodo. The simulated data and aligned datasets used during the interdataset experiments can be downloaded from the scHPL Zenodo. The filtered PBMC-FACS and AMB2018 dataset can be downloaded from the scRNA-seq benchmark Zenodo
For citation and further information please refer to: "Hierarchical progressive learning of cell identities in single-cell data"