Skip to content

crossover estimation, localisation, and single cell haplotyping 🤡

Notifications You must be signed in to change notification settings

schneebergerlab/coelsch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

207 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

coelsch: Crossover estimation, localisation and single cell haplotyping

(formerly known as snco)

coelsch is a set of tools for identifying recombination events at the single nucleus/cell level, using read alignments and/or SNP matrices from single nucleus sequencing experiments of recombinant haploid gametes. Reads alignments or SNPs are summarised into a per-cell barcode "marker" dataset which can then be used to perform crossover analysis with a hidden Markov model. coelsch also provides commands for generating per-barcode summary statistics, plotting marker and crossover profiles, identifying segregation distortions, and simulating ground truth datasets using marker distributions from real data.

coelsch is still under active development, so expect some bugs and changes! If in doubt about the behaviour of coelsch or how it might change, feel free to reach out by opening an issue!

Installation:

The easiest way to install coelsch is using the conda yaml provided:

git clone https://github.com/schneebergerlab/coelsch.git
cd coelsch
conda env create -f coelsch.yml

alternatively, you can install it using pip:

pip install git+https://github.com/schneebergerlab/coelsch.git

coelsch requires torch and pomegranate>=1.0 for performing crossover detection, pysam and joblib for bam file manipulation and numpy, pandas, scipy and matplotlib for general marker analysis and plotting. The command line interface is build using click. For some of the provided helper scripts, parasail is also required.

Quickstart:

coelsch divides the analysis into several subcommands for modular and flexible workflows. There are currently two different methods provided for loading markers into the format used for crossover prediction. These are:

  • coelsch loadbam: read a bam file aligned with cell barcode and haplotype alignment tags. The best way to generate this is using the coelsch_mapping_pipeline, a haplotype and single-cell aware alignment pipeline using STAR consensus.
  • coelsch loadcsl: read a matrix market + vcf file describing biallelic SNP counts for individual cells, generated by cellsnp-lite. The reference allele is assumed to derive from haplotype 1, and the alternative allele from haplotype 2.

The output of both load commands is a json file containing marker information, which can be fed into the downstream clean, predict, stats and plot commands.

If you would like to perform the full analysis in a single step, then the core analysis pipeline, load(+clean)+predict, can be run together as a single command using coelsch bam2pred or coelsch csl2pred.

Python API

coelsch also provides an API with some helpful classes for working with output marker and predictions datasets in python. These are the MarkerRecords and PredictionRecords classes:

from coelsch import MarkerRecords, PredictionRecords

co_markers = MarkerRecords.read_json('markers.json')
co_preds = PredictionRecords.read_json('pred.json')

co_markers.barcodes[:5]

    ['TGGTTAGGTAGATTGA',
     'ACGTAGTTCATCAGTG',
     'GACCAATCAACAAAGT',
     'CTATAGGGTTACCTTT',
     'AGAGAATCAGACAATA']

Plotting functions can also accessed both from the command line using coelsch plot and also in python using the coelsch.plot module or built in methods of MarkerRecords/PredictionRecords

co_markers.plot_barcode('TGGTTAGGTAGATTGA', co_preds=co_preds, max_yheight=10);

single nucleus marker plot

About

crossover estimation, localisation, and single cell haplotyping 🤡

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages