Skip to content

A suggested workflow, and accompanying scripts, to assemble Single Amplified Genomes (SAGs) from MDA derived Hi/MiSeq (Overlapped) Paired-End Illumina Libraries.

Notifications You must be signed in to change notification settings

GenomicsNX/single_cell_workflow

 
 

Repository files navigation

Single Amplified Genome Assembly Workflow

This is a suggested workflow for the assembly of MDA sequenced Single Amplified Genomes from Illumina Hi/MiSeq libraries.

SAG Assembly Workflow Diagram

SAGA Workflow

SAG Assembly Workflow

  1. Read Normalisation - Optional!! BBNORM
  2. Overlapping of Read Libraries PEAR or BBMERGE
  3. Trimming and Adaptor Cleaning Trim Galore!
  • Remove any Illumina Sequencing Adaptors, poly-A tails, sequence quality score <20, etc.
  1. Assembly of Prepared Reads SPAdes
  • We find that SPAdes gives good results in SC mode, you may also like to try IDBA-UD and Velvet-SC both of which can assemble SCs. Not implemented here.
  1. Assembly Statistics QUAST
  • e.g. N50, contig/scaffold lengths/quantity, etc.
  1. Read Mapping BWA
  • This can help with coverage information etc, but we will be using it mostly for "blobology" (see step 7)
  1. BLAST Report
  • Top hits to NCBIs nt database using megablast, beware false top-hits as with any BLAST search.
  1. BLOBology - GC/Taxonomy Maps blobtools
  • GC/Coverage plots with taxonomy information to look for contamination.
  1. Genome 'Completeness' Tests CEGMA & BUSCO v1 - V2.0 Coming Soon!
  2. MultiQC - Aggregate results from bioinformatics analyses across many samples into a single report

Example Usage / Help

Program parameters are order based, n must come before a, p before t, etc...

Single Amplified Genome Assembly Pipeline
Basic Usage:
Required Parameters:
  -f <forward.fastq>
  -r <reverse.fastq>
  -o <./output_dir>
Pipeline Parameters:
  -n  Read Normalisation (optional)
  -a 	Run All Options Below (ptsqcbBm)
  -p  <pear|bbmerge>	Overlap Reads (pear is default)
  -t 	Trim Overlapped Reads
  -s  Assemble Trimmed Reads
Reports:
  -q 	Run QUAST
  -c 	Run CEGMA
  -b 	Run BUSCO
  -B 	Run BlobTools
  -m 	Run MultiQC

Example: run_single_cell_assemblies.sh -f r1.fastq -r r2.fastq -o output_dir -n -a

Gene Prediction Workflow

A suggested workflow for predicting genes from your assembly.

SAGA Workflow

Other Thoughts on Assembly & Downstream/Other Analyses

  1. ESOM?
  2. Contig Integrator for Sequence Assembly - CISA
  3. Protocol for fully automated Decontamination of Genomes - ProDeGe
  4. CheckM - CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
  • Looks like a nice tool, huge selection of options though and bac/arch oriented but can do ~euk.

Initial Paper Citation

This work was initially started from [insert paper here] for which the original scripts are available as a release with the below DOI. The repository and scripts have subsequently changed quite significantly, although the workflow remains much the same. DOI

About

A suggested workflow, and accompanying scripts, to assemble Single Amplified Genomes (SAGs) from MDA derived Hi/MiSeq (Overlapped) Paired-End Illumina Libraries.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 100.0%