The package can be installed from this github repository:
# Install devtools for github installation if not present
require(devtools)
# Install package from github repo
devtools::install_github("https://github.com/JamesOpz/splitRtools")
The splitRtools package is a collection of tools that are used to
process SPLiT-seq scRNA-seq data (Rosenberg et.al,
2019).
The splitRtools package is designed to take as input data,
the output from the zUMIs package
(paper).
The zUMIs package is used to take raw FASTQ output, assign and filter
reads to barcodes and map reads to the reference genome producing a
cellxcount matrix, as well as some reporting about the pipeline outputs.
A sample zUMIs pipeline with configuration to work with the
Rosenberg-2019 barcode setup is available
here.
The splitRtools pipeline is run through the run_split_pipe()
function,
which acts as a wrapper function to execute the pipeline. A basic setup
for the pipeline is as follows:
# Load splitRtools
library(splitRtools)
# Run the splitRtool pipeline
# You must always point to two parent folders containing sublibrary raw FASTQ folders
# Each sublibrary is within this folder and must contain zUMIs output
run_split_pipe(mode = 'merge', # Merge sublibraries or process seperately
n_sublibs = 2, # How many to sublibraries are present
data_folder = "./../test_data_sp_5_miseq/", # Location of zUMIs data directory
output_folder = "../test_data_sp_5_miseq_outputs/", # Output folder path
filtering_mode = "knee", # Filter by knee (standard) or manual value (default 1000) transcripts
fastq_path = "../fastq_single/", # Path to folder containing subibraru raw FastQ
rt_bc = "../test_data_sp_5_miseq/barcodes_v1.csv", # RT barcode map
lig_bc = "../test_data_sp_5_miseq/barcodes_v1.csv", # Ligation barcode map
sample_map = "../test_data_sp_5_miseq/cell_metadata.xlsx" # RT plate layout file
)
The first stage of the pipeline labels converts the cell count matrix
into a SingleCellExperiment
object and labels each cell with various
ColData
with a series of well IDs based each stage of the barcoding
process and the correspondence between the RT wells and the on the
sample_map
excel file provided. This data is then stored as an SCE
or an annData
object in unfiltered/
output folder for each
sublibrary.
The splitRtool pipeline will generate a set of diagnostic plots in order
to evaluate the initial quality of the SPLiT-seq scRNA-seq data.
After labeling the data is filtered using either the
DropletUtils
package spline-fitting functionality or a user specified
cutoff of transcripts. This produces the following waterfall plot along
with quantifiaction of the cell types recovered by sample:
The barcoding cell data is then mapped to the respective plate
locations across the 3 barcoding rounds to provide a series of heatmaps
displaying cells recovered per well and median UMI per cell across all
wells: