-
Notifications
You must be signed in to change notification settings - Fork 98
Assembler differences
Graphs produced by Velvet are usually called 'LastGraph' and are produced with a specified k-mer. The sequence in a node is made of the final base of each of the k-mers in that node.
This figure (adapted from Zerbino and Birney, Genome Research, 2008) shows 3 nodes in a de Bruijn graph with a k-mer size of 5. The node sequences are in the blue rectangles and the k-mer sequences are shown next to the nodes. It illustrates that the k-mers in each node are reverse complements of the k-mers in the opposite node. However, the node sequences are not exact reverse complements, but are shifted by a distance of k-1 (4 in this case).
Because each node in a Velvet graph only contains one letter per k-mer, it is common to see complex areas of the graph with very short nodes, sometimes just 1 or 2 bases.
SPAdes and MEGAHIT use a de Bruijn graph like Velvet, but they generate the node sequences differently. They include the entirety of the first k-mer in each node.
This has important consequences. First, in a SPAdes/MEGAHIT graph, a node's sequence is the exact reverse complement of its opposite node. Second, nodes in a SPAdes/MEGAHIT graph are never shorter than the k-mer size. Third, connected nodes overlap by an amount equal to k-1. In the illustrated example, the sequence TAGACTGATTG and ATTGACCA overlap by 4.
When Bandage loads a SPAdes/MEGAHIT graph, it automatically detects these overlaps so they can be removed in path sequences (see graph paths). Overlaps can also become apparent when viewing BLAST hits. If a hit extends to the end of one node of a SPAdes/MEGAHIT graph, the connected nodes may also show small hits in the overlap region.
To be written
- Home
- Getting started
- Settings:
- Functionality:
- Assembly:
- Example uses:
- Media:
- About