Skip to content

Assembler differences

Ryan Wick edited this page Sep 4, 2015 · 11 revisions

Velvet

Graphs produced by Velvet are usually called 'LastGraph' and are produced with a specified k-mer. The sequence in a node is made of the final base of each of the k-mers in that node.

Velvet graph

This figure (adapted from Zerbino and Birney, Genome Research, 2008) shows 3 nodes in a de Bruijn graph with a k-mer size of 5. The node sequences are in the blue rectangles and the k-mer sequences are shown next to the nodes. It illustrates that the k-mers in each node are reverse complements of the k-mers in the opposite node. However, the node sequences are not exact reverse complements, but are shifted by a distance of k-1 (4 in this case).

Because each node in a Velvet graph only contains one letter per k-mer, it is common to see complex areas of the graph with very short nodes, sometimes just 1 or 2 bases.

SPAdes/MEGAHIT

SPAdes and MEGAHIT use a de Bruijn graph like Velvet, but they generate the node sequences differently. They include the entirety of the first k-mer in each node.

SPAdes graph

This has important consequences. First, in a SPAdes/MEGAHIT graph, a node's sequence is the exact reverse complement of its opposite node. Second, nodes in a SPAdes/MEGAHIT graph are never shorter than the k-mer size. Third, connected nodes overlap by an amount equal to k-1. In the illustrated example, the sequence TAGACTGATTG and ATTGACCA overlap by 4.

When Bandage loads a SPAdes/MEGAHIT graph, it automatically detects these overlaps so they can be removed in path sequences (see graph paths). Overlaps can also become apparent when viewing BLAST hits. If a hit extends to the end of one node of a SPAdes/MEGAHIT graph, the connected nodes may also show small hits in the overlap region.

BLAST hit overlap

Trinity

To be written