Skip to content

Working with very large graphs

Ryan Wick edited this page Jul 4, 2016 · 6 revisions

Using Bandage to examine very large graphs, such as from a metagenome or eukaryote assembly, presents challenges:

  • Bandage will use more RAM with large graphs which can limit the size of a graph that can be reasonably viewed on a given computer.
  • Loading the graph file will take longer, possibly several minutes.
  • Laying out the graph nodes will take much longer, possibly tens of minutes.
  • The graphical performance of Bandage (zooming, panning, etc.) will be slow.

As an example, I loaded a metagenome assembly with 1,154,476 nodes, 1,277,345 edges and 170,211,366 bases of sequence (those values are only for positive nodes, so they are actually double for the entire graph, see single vs double node style) using a Mid 2012 Macbook Pro (2.5 GHz Intel Core i5 Ivy Bridge, 16 GB of RAM). Loading the graph took about 2.5 minutes and Bandage used about 1.8 GB of RAM before drawing it. With the graph layout iterations setting at a minimum (see graph layout and appearance), it took 14.5 minutes to display the entire graph, and Bandage's RAM usage peaked at 12 GB.

Large graph

For these reasons, it is best to use a reduced graph scope with large graphs, such as 'Around nodes' or 'Around BLAST hits'.

If you do view large sections (or the entirety) of the graph, consider these suggestions for improving graphical performance:

  • Reduce the 'Graph layout iterations' setting to a minimum to minimise the time spent on laying out the graph, which can be considerable for large graphs.
  • Decrease the 'Node length per megabase' setting to make nodes shorter. This can shorten the layout time and improve visualisation performance.
  • Increase the 'Node segment length' setting to make node shapes simpler and less smooth.
  • The responsiveness of Bandage is largely dependent on the number of nodes visible at once. Therefore, a zoomed-in view of a large graph (showing only part of the graph) will perform better than a zoomed-out view (showing the whole graph).
  • Drawing text on the graph slows down graphical performance. It is therefore recommended to turn off labels for large visualisations.
  • Drawing node outlines slows down graphical performance. Node outlines are off by default (i.e. set to 0.0), so it is recommended to use this default setting for large visualisations.
  • Antialiasing (on by default) slows down graphical performance. Turning this setting off can help with large visualisations.