This is a suggested workflow for the assembly of MDA sequenced Single Amplified Genomes from Illumina Hi/MiSeq libraries.
- Read Normalisation - Optional!! BBNORM
- Overlapping of Read Libraries PEAR or BBMERGE
- Trimming and Adaptor Cleaning Trim Galore!
- Remove any Illumina Sequencing Adaptors, poly-A tails, sequence quality score <20, etc.
- Assembly of Prepared Reads SPAdes
- We find that SPAdes gives good results in SC mode, you may also like to try IDBA-UD and Velvet-SC both of which can assemble SCs. Not implemented here.
- Assembly Statistics QUAST
- e.g. N50, contig/scaffold lengths/quantity, etc.
- Read Mapping BWA
- This can help with coverage information etc, but we will be using it mostly for "blobology" (see step 7)
- BLAST Report
- Top hits to NCBIs nt database using megablast, beware false top-hits as with any BLAST search.
- BLOBology - GC/Taxonomy Maps blobtools
- GC/Coverage plots with taxonomy information to look for contamination.
- Genome 'Completeness' Tests CEGMA & BUSCO v1 - V2.0 Coming Soon!
- MultiQC - Aggregate results from bioinformatics analyses across many samples into a single report
Program parameters are order based, n must come before a, p before t, etc...
Single Amplified Genome Assembly Pipeline
Basic Usage:
Required Parameters:
-f <forward.fastq>
-r <reverse.fastq>
-o <./output_dir>
Pipeline Parameters:
-n Read Normalisation (optional)
-a Run All Options Below (ptsqcbBm)
-p <pear|bbmerge> Overlap Reads (pear is default)
-t Trim Overlapped Reads
-s Assemble Trimmed Reads
Reports:
-q Run QUAST
-c Run CEGMA
-b Run BUSCO
-B Run BlobTools
-m Run MultiQC
Example: run_single_cell_assemblies.sh -f r1.fastq -r r2.fastq -o output_dir -n -a
A suggested workflow for predicting genes from your assembly.
- ESOM?
- Contig Integrator for Sequence Assembly - CISA
- Protocol for fully automated Decontamination of Genomes - ProDeGe
- CheckM - CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
- Looks like a nice tool, huge selection of options though and bac/arch oriented but can do ~euk.
This work was initially started from [insert paper here] for which the original scripts are available as a release with the below DOI. The repository and scripts have subsequently changed quite significantly, although the workflow remains much the same.