Description | Installation | Basic usage | Extended usage | Output explained | Input file format | Downstream
Single cell RNA sequencing quality control package for sample-specific damaged cell detection through low dimension mitochondrial and ribosomal cluster selection.
limiric
requires the following packages to be installed in your R
environment
cowplot
Wilke, 2024. GitHub repoDropletQC
Muskovic, 2024. GitHub repodplyr
Wickham et al, 2023. GitHub repoggplot2
Wickham, 2016. GitHub repoMatrix
Bates et al, 2024. Github repopng
Urbanek, 2022. GitHubSeurat
Satija and Farrell et al, 2015. Website, GitHub repoSoupX
Young and Behjati, 2020. GitHub repo
If not already, you can install them together
packages <- c("cowplot", "dplyr", "ggplot2", "Matrix", "png", "Seurat", "SoupX")
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
install.packages(pkg)
}
}
But DropletQC
must be installed as follows
devtools::install_github("powellgenomicslab/DropletQC")
Be sure to load these into your environment before continuing with limiric
installation
packages <- c("cowplot", "dplyr", "DropletQC", "ggplot2", "Matrix", "png", "Seurat", "SoupX")
for (pkg in packages) {
if (!require(pkg, character.only = TRUE)) {
library(pkg)
}
}
Please check the associated documentation if problems occur in the installation of any of the prerequisite packages
After all the prerequisites are installed, you can install the latest development version of limiric
devtools::install_github("AlicenJoyHenning/limiric")
To ensure limiric
has correctly installed, you can perform a test run using a small example dataset stored in the testrun
directory of this repository, also found here. If you need assistance setting this up, please see Test run for more details.
- Detect damaged cells in a sample using the filtered alignment files (
barcodes.tsv.gz
,features.tsv.gz
,counts.mtx.gz
). All you need to input is your sample name, the directory where your alignment files are stored, and a directory where thelimiric
output can be created.
SRR1234567 <- limiric(
ProjectName = "SRR1234567",
FilteredPath = "/home/user/alignment/SRR1234567/filtered/",
OutputPath = "/home/user/alignment/limiric/"
)
NB Please keep the following in mind:
- Your files must be zipped, see Input file format for more
- The above usage assumes the sample is of human origin (See parameter:
Organism
)- When compiling the data, all genes will be retained (See parameter:
MinCells
)- Red blood cells will automatically be removed from the sample before damaged cell detection (See parameter:
FilterRBC
)- Your output
Seurat
object will be filtered, containing only undamaged cells (See parameter:FilterOutput
)
- Alternatively, you can use a
Seurat
object as input
SRR1234567 <- limiric(
ProjectName = "SRR1234567",
SeuratInput = seurat_object,
OutputPath = "/home/user/alignment/limiric/"
)
- If you have more than one sample, you may want to define a sample list
instead of running each individually.
You can do this through the
sample_list
parameter.
sample_list <- list(
list(ProjectName = "SRR1234567",
FilteredPath = "/home/user/alignment/SRR1234567/filtered/",
OutputPath = "/home/user/alignment/limiric/"),
list(ProjectName = "SRR1234568",
FilteredPath = "/home/user/alignment/SRR1234568/filtered/",
OutputPath = "/home/user/alignment/limiric/"),
list(ProjectName = "SRR1234569",
FilteredPath = "/home/user/alignment/SRR1234569/filtered/",
OutputPath = "/home/user/alignment/limiric/")
)
GSE1234567 <- limiric(sample_list = sample_list)
NB Please note that you must define your sample list with each parameter specified, i.e.
limiric
won't know what to do if your list looks like this :sample_list <- list( list("SRR1234567", "/home/user/alignment/SRR1234567/filtered/", "/home/user/alignment/limiric/"), list("SRR1234568", "/home/user/alignment/SRR1234568/filtered/", "/home/user/alignment/limiric/"), list("SRR1234569", "/home/user/alignment/SRR1234569/filtered/", "/home/user/alignment/limiric/") )
Download the matrix.mtx
, barcodes.tsv
. and features.tsv
files from this location (files) and store them in a local directory. For demonstration purposes, we will assume this directory is /home/user/scRNA-seq/testrun/
. From here, you can run the basic limiric
function in your R
environment where the terminal output should look as indicated below. For further verification, ensure the output Testrun_CellQC.png
is identicial to that shown here.
testrun <- limiric(
ProjectName = "Testrun",
FilterRBC = FALSE,
FilteredPath = "/home/user/scRNA-seq/testrun/",
OutputPath = "/home/user/scRNA-seq/testrun/"
)
# Beginning limiric analysis for TestRun ...
# β Seurat object created
# β limiric damaged cell predictions
# β limiric analysis complete.
Note: This shouldn't take more than 10 seconds to run but this may vary depending on your machine.
Before interpretting limiric
's outputs, it may be helpful to understand a bit of what's going on in the background.
As you know, the magic of scRNA-seq revolves around the identification of distinct cell populations. In majority of scRNA-seq analysis workflows, distinct populations are identified according to the
clusters that form when dimensionality reduction is performed on the most variable genes in a sample. For instance, the typical Seurat
workflow uses the top two or three thousand variable genes.
In our case, instead of trying to isolate distinct cell-type associated populations, we are only interested in two broad populations: damaged and healthy cells. In general, most of what we know about how these two populations differ can be summarized by looking at the expression of mitochondrial and ribosomal genes. Thus, if we perform dimensionality reduction and clustering using only these genes, we expect to see a division of cells into two clusters associated with the cell's status as either damaged or healthy.
This is the basis of the low dimension mitochondrial & ribosomal clustering
, aka limiric
, algorithm.
The limiric
output will be created in the OutputPath
directory with the following structure
OutputPath/
|
βββ RBCQC
|
βββ CellQC
|
βββ Filtered
Before damaged cells can be identified, limiric
first removes red blood cells, or cells that are highly contaminated with haemoglobin, from your data. This is done under the assumption that these cells will not be informative to your study. If this assumption should not be true, you can avoid this filtering using FilterRBC = FALSE
.
NB Given many scRNA-seq protocols, such as the
10X Genomics
protocol, advise for globin treatment, we hope for the contamination percentage to be as low as possible.
The output here comes in the form of a scatter plot where the red blood cells are coloured in blue. The percentage of the total cells that were removed is also given in the plot. |
This directory contains core diagnostic plots for the limiric
damaged cell detection algorithm. As shown below, you will see four tSNE plots. In this example dataset, you can see the cells divide into two distinct clusters; but which contain damaged cells and which contain healthy cells?
From literature we know that damaged cells have high mitochondrial expression and low ribosomal expression. This is largely because damaged cells, such as those undergoing apoptosis, are characterised by compromised cell membranes. In this case, free cytoplasmic RNA- like ribosomal RNA- is more likely to have escaped before being captured and sequenced, meaning its expression will be low. While the RNA enclosed within the mitochondria, which has its own membrane, is retained, meaning its expression will be high. But many other factors likely contribute to this phenomenon, including impaired ribosomal biogenesis in damaged cells.
Now answering the question of which cluster is which is easy with the limiric
tSNEs where the mitochondrial and ribosomal gene expression of each cell is made visible.
Together, these metrics are used by the limiric
algorithm to annotate the damaged cells, as shown in the fourth and final tSNE. In summary, by looking at the limiric
tSNE plots, we can immediately tell that we have a small cluster of cells that are likley damaged and should be removed from the sample. This makes sense as, unless something went drastically wrong during sample preparations, you expect there to be far more healthy than damaged cells in your scRNA-seq data.
The main output of the limiric
function comes in the form of a barcodes.csv
containing annotations for each cell barcode of the input data. This can easily be incorporated into an existing scRNA-seq analysis workflow for filtering (see example).
Barcode | Limiric |
---|---|
AAACCTGAGATAGGAG | cell |
AAACCTGAGCTATGCT | cell |
AAACCTGAGCTGTTCA | damaged |
AAACCTGCACATTAGC | damaged |
... | ... |
Additionally, limiric
will output a Seurat
object with ready-filtered barcodes for seamless integration at the start of a Seurat
pre-processing workflow.
Standard good practice highly advises to correct for ambient RNA in your scRNAseq data. If you haven't already performed some kind of correction,
the SoupX
parameter allows you to do so. This will occur before anything else in the limiric
workflow.
SRR1234567 <- limiric(
ProjectName = "SRR1234567",
FilteredPath = "/home/user/alignment/SRR1234567/filtered/",
SoupX = TRUE,
RawPath = "/home/user/alignment/SRR1234567/raw/",
OutputPath = "/home/user/alignment/limiric/"
)
If you have a sample where only the immune cells are of interest, include the following IsolateCD45
parameter. This will isolate the immune cells present in the sample, then identify damaged cells.
SRR1234567 <- limiric(
ProjectName = "SRR1234567",
FilteredPath = "/home/user/alignment/SRR1234567/filtered/",
IsolateCD45 = TRUE,
OutputPath = "/home/user/alignment/limiric/"
)
NB This will change your output directory structure by adding a new
IMCQC
layerOutputPath/ βββ CellQC βββ IMCQC βββ RBCQC βββ Filtered
This output, like RBCQC, contains a scatter plot showing the removed cell in blue and the retained cells in grey. |
Detect damaged cells and compare results with those from DropletQC
.
SRR1234567 <- limiric(
ProjectName = "SRR1234567",
FilteredPath = "/home/user/alignment/SRR1234567/filtered/",
DropletQC = TRUE,
VelocytoPath = "/home/user/alignment/velocyto/",
OutputPath = "/home/user/alignment/limiric/"
)
NB This will change your output directory structure by adding a new
DropletQC
layerOutputPath/ βββ CellQC βββ DropletQC βββ RBCQC βββ Filtered
This will output a scatter plot and tSNE showing the cells annotated as damaged by both limiric and DropletQC .
|
Perform ambient RNA correction with SoupX
, filter red blood cells, isolate immune cells, detect damaged cells, and compare against DropletQC
.
SRR1234567 <- limiric(
ProjectName = "SRR1234567",
FilteredPath = "/home/user/alignment/SRR1234567/filtered/",
SoupX = TRUE,
RawPath = "/home/user/alignment/SRR1234567/raw/",
DropletQC = TRUE,
IsolateCD45 = TRUE,
VelocytoPath = "/home/user/alignment/velocyto/",
OutputPath = "/home/user/alignment/limiric/"
)
NB This will mean your output directory looks like this
OutputPath/ βββ CellQC βββ DropletQC βββ IMCQC βββ RBCQC βββ Filtered
sample_list <- list(
list(ProjectName = "SRR1234567",
FilteredPath = "/home/user/alignment/SRR1234567/filtered/",
SoupX = TRUE,
RawPath = "/home/user/alignment/SRR1234567/raw/",
DropletQC = TRUE,
IsolateCD45 = TRUE,
VelocytoPath = "/home/user/alignment/velocyto/",
OutputPath = "/home/user/alignment/limiric/"),
list(ProjectName = "SRR1234568",
FilteredPath = "/home/user/alignment/SRR1234568/filtered/",
SoupX = TRUE,
RawPath = "/home/user/alignment/SRR1234568/raw/",
DropletQC = TRUE,
IsolateCD45 = TRUE,
VelocytoPath = "/home/user/alignment/velocyto/",
OutputPath = "/home/user/alignment/limiric/"),
list(ProjectName = "SRR1234569",
FilteredPath = "/home/user/alignment/SRR1234569/filtered/",
SoupX = TRUE,
RawPath = "/home/user/alignment/SRR1234569/raw/",
DropletQC = TRUE,
IsolateCD45 = TRUE,
VelocytoPath = "/home/user/alignment/velocyto/",
OutputPath = "/home/user/alignment/limiric/")
)
GSE1234567 <- limiric(sample_list = sample_list)
If your alignment output files are not zipped (end with a .gz
extension), you will need to find a way to do this
before using limiric
. If you have a Linux
or mac
machine you can open the terminal in the directory where your files are stored and use gzip
cd path/to/directory
gzip *
This assumes that your files are the only items inside the directory. As with the standard output of many alignment algorithms such as
STARsolo
andCellRanger
, each sample should be stored in its own directory with the following file naming convention :path/ | βββ matrix.mtx | βββ barcodes.tsv | βββ features.tsv
If you have a Windows
machine, the simplest solution is to install Windows Subsystem for Linux
π§ which creates a new terminal environment for you with Linux
capabilites. From there, you can do the same as above. Note that WSL
uses a different path structure to access file systems where Windows
directories are mounted under /mnt/
cd /mnt/c/Users/path/to/directory
gzip *
# Add output to pre-existing Seurat object
limiric_annotations <- read.csv2("path/to/ProjectName_barcodes.csv",
sep = ",",
col.names = c("barcodes", "limiric"))
seurat@meta.data$limiric <- limiric_annotations$limiric[match(rownames(seurat@meta.data), limiric_annotations$barcodes)]
# Using pre-existing dimensionality reductions, visualise the limiric annotations
limiric_visual <- DimPlot(seurat, group.by = limiric)
# Filter cells marked as damaged by limiric
seurat_filtered <- subset(seurat, limiric == "cell")
seurat_filtered$limiric <- NULL # once used, remove the column