(formerly known as snco)
coelsch is a set of tools for identifying recombination events at the single nucleus/cell level, using read alignments and/or SNP matrices from single nucleus sequencing experiments of recombinant haploid gametes. Reads alignments or SNPs are summarised into a per-cell barcode "marker" dataset which can then be used to perform crossover analysis with a hidden Markov model. coelsch also provides commands for generating per-barcode summary statistics, plotting marker and crossover profiles, identifying segregation distortions, and simulating ground truth datasets using marker distributions from real data.
coelsch is still under active development, so expect some bugs and changes! If in doubt about the behaviour of coelsch or how it might change, feel free to reach out by opening an issue!
The easiest way to install coelsch is using the conda yaml provided:
git clone https://github.com/schneebergerlab/coelsch.git
cd coelsch
conda env create -f coelsch.yml
alternatively, you can install it using pip:
pip install git+https://github.com/schneebergerlab/coelsch.git
coelsch requires torch and pomegranate>=1.0 for performing crossover detection, pysam and joblib for bam file manipulation and numpy, pandas, scipy and matplotlib for general marker analysis and plotting. The command line interface is build using click. For some of the provided helper scripts, parasail is also required.
coelsch divides the analysis into several subcommands for modular and flexible workflows. There are currently two different methods provided for loading markers into the format used for crossover prediction. These are:
coelsch loadbam: read a bam file aligned with cell barcode and haplotype alignment tags. The best way to generate this is using thecoelsch_mapping_pipeline, a haplotype and single-cell aware alignment pipeline usingSTARconsensus.coelsch loadcsl: read a matrix market + vcf file describing biallelic SNP counts for individual cells, generated bycellsnp-lite. The reference allele is assumed to derive from haplotype 1, and the alternative allele from haplotype 2.
The output of both load commands is a json file containing marker information, which can be fed into the downstream clean, predict, stats and plot commands.
If you would like to perform the full analysis in a single step, then the core analysis pipeline, load(+clean)+predict, can be run together as a single command using coelsch bam2pred or coelsch csl2pred.
coelsch also provides an API with some helpful classes for working with output marker and predictions datasets in python. These are the MarkerRecords and PredictionRecords classes:
from coelsch import MarkerRecords, PredictionRecords
co_markers = MarkerRecords.read_json('markers.json')
co_preds = PredictionRecords.read_json('pred.json')
co_markers.barcodes[:5]
['TGGTTAGGTAGATTGA',
'ACGTAGTTCATCAGTG',
'GACCAATCAACAAAGT',
'CTATAGGGTTACCTTT',
'AGAGAATCAGACAATA']
Plotting functions can also accessed both from the command line using coelsch plot and also in python using the coelsch.plot module or built in methods of MarkerRecords/PredictionRecords
co_markers.plot_barcode('TGGTTAGGTAGATTGA', co_preds=co_preds, max_yheight=10);
