-
Notifications
You must be signed in to change notification settings - Fork 98
Assembler differences
Graphs produced by Velvet are usually called 'LastGraph' and are produced with a specified k-mer. The sequence in a node is made of the final base of each of the k-mers in that node.
This figure (adapted from Zerbino and Birney, _Genome Research_, 2008) shows 3 nodes in a de Bruijn graph with a k-mer size of 5. The node sequences are in the blue rectangles and the k-mer sequences are shown next to the nodes. It illustrates that the k-mers in each node are reverse complements of the k-mers in the opposite node. However, the node sequences are not exact reverse complements, but are shifted by a distance of k-1 (4 in this case).
Because each node in a Velvet graph only contains one letter per k-mer, it is common to see complex areas of the graph with very short nodes, sometimes just 1 or 2 bases.
SPAdes uses a de Bruijn graph like Velvet, but it generates the node sequences differently. They include the entirety of the first k-mer in each node.
This has important consequences. First, in a SPAdes graph, a node's sequence is the exact reverse complement of its opposite node. Second, nodes in a SPAdes graph are never shorter than the k-mer size. Third, connected nodes overlap by an amount equal to k-1. In the illustrated example, the sequences TAGACTGATTG and ATTGACCA overlap by 4.
When Bandage loads a SPAdes graph, it automatically detects these overlaps so they can be removed in path sequences (see graph paths). Overlaps can also become apparent when viewing BLAST hits. If a hit extends to the end of one node of a SPAdes graph, the connected nodes may also show small hits in the overlap region.
Note that prior to version 3.5.0, SPAdes had a bug which resulted in missing graph edges. Therefore, when using Bandage, be sure to use SPAdes v3.5.0 or later.
MEGAHIT uses the same graph format as SPAdes. MEGAHIT graphs also have node overlaps, so the above notes regarding SPAdes graphs apply to MEGAHIT graphs as well.
MEGAHIT has had the ability to generate graph files since version 0.3.0. They are not made automatically but must be generated by running megahit_toolkit contig2fastg
. See visualizing MEGAHIT's contig graph for more information.
Trinity graphs are unique in their node naming scheme. Assembled Trinity sequences can be grouped at multiple levels: transcript, component, gene and isoform. For this reason, node names in Bandage for a Trinity graph have prefixes that mirror the headers in the Trinity.fasta file.
These node name prefixes can be useful when finding nodes and specifying graph scope using the 'Partial' match option (see graph scope).
Trinity graphs do not have the overlap present in SPAdes/MEGAHIT graphs.
- Home
- Getting started
- Settings:
- Functionality:
- Assembly:
- Example uses:
- Media:
- About