title |
---|
Tabnetviz User Guide |
Home | User Guide | Tutorial | Config file reference | Demo | Gallery
Tabnetviz is a command-line tool for generating network visualizations from tabular data. The data are provided to the program as a node table and an edge table defining the network; both tables can contain an arbitrary number of node properties and edge properties. These tables can be provided as comma-separated (CSV) or tab-separated (TSV) text files or as Excel worksheets. Tabnetviz generates the network visualization by mapping the node and edge properties to visual attributes such as node sizes, shapes, colors, edge colors, width, etc. This mapping is defined in a simple configuration file, which is written manually and uses the human-readable YAML format. The output from Tabnetviz is a drawing as an SVG file or an image file.
The function of Tabnetviz is illustrated with the following flowchart (itself created with tabnetviz):
- Tabnetviz User Guide
- INSTALLATION
- COMMAND LINE
- CONFIGURATION FILE
Table of contents generated with markdown-toc
Tabnetviz uses Python 3.2+. Once you have Python installed, you can install Tabnetviz by
pip install tabnetviz
(or pip3
if you have separate pips for Python 2 and 3).
Alternatively, you can download the source distribution from github.
(Note that sometimes pip
cannot install pygraphviz
(required for
tabnetviz) correctly because of a compilation error. In this case,
you may try to install it in another way. On Debian Linux, use apt install python3-pygraphviz
, or apt install libgraphviz-dev
before
using pip
. See this
discussion for
more details.)
tabnetviz [-h] [-w] [-n
nodetable
] [-e
edgetable
] [-o
drawingoutput
] [--nodetableout
nodeout
] [--edgetableout
edgeout
] [--configtemplate]
configfile
-h
: print a help message-w
: "watch" mode: the program will not exit after generating the output; instead, it will watch the configuration file for changes, and regenerate the output whenever a change is detected. This is useful for developing and refining the configuration file. If the output is an image, using this option along with an image viewer that also reloads the image upon a file change allows one to develop the configuration file semi-interactively.qiv
for Linux is an example of a suitable image viewer (use it with the-T
option).-n | --nodetable
nodetable
: the node table file name can be specified here; this will override the name specified in the configuration file. This is useful in scripts if you want to generate several drawings from different inputs but the same graphical settings.-e | --edgetable
edgetable
: the edge table file name can be specified here; this will override the name specified in the configuration file.-o | --output
drawingoutput
: the output file name for the drawing (e.g..svg
file). This will override the setting in the configuration file.--nodetableout
nodeout
: file name to write out the modified node table; overrides the setting in the configuration file.--edgetableout
edgeout
: file name to write out the modified edge table; overrides the setting in the configuration file.--configtemplate
: write a configuration file template to the specified file (the file must not exist). This can be edited to develop a configuration file for your visualization.
Below, we describe how the configuration file specifies the network and the mappings between node/edge properties and visual attributes. Please refer to the Configuration File Reference for detailed descriptions of configuration file options.
The configuration file uses the YAML format, which is essentially a hierarchical structure of keywords and values. To make it easier to refer to an element in the hierarchy, we will use a path-like notation, e.g. if the YAML contains
keyword1:
keyword2: value
then the first keyword will be referred to as /keyword1
, and the
second keyword as /keyword1/keyword2
, or just keyword2
when the
context makes it clear that it is under /keyword1
.
A configuration file template can be written from the program using
the --configtemplate
option. An easy way to develop a configuration
file for your visualization is to start with this template and edit it
by uncommenting and editing lines as you wish.
Note that the YAML parser performs a number of type conversions. In particular, the words yes, Yes, YES, no, No, NO, true, True, TRUE, false, False, FALSE, on, On, ON, off, Off, OFF will all be parsed as Boolean. On the other hand, the node and edge table files (CSV, TSV, or Excel) will be parsed with the Pandas module of Python. This will parse the words true, True, TRUE, false, False, FALSE as Boolean, but the other words listed above will be parsed as strings.
In addition, the #
character indicate comments in YAML, thus values
containing it must be enclosed in (single or double) quotes. For
example, write color: '#ff00ff'
and not color: #ff00ff
.
The type of the network (directed
or undirected
) is specified with
the /networktype
keyword. The default is undirected
. If you have a
network with both directed and undirected edges, define it either
directed
or undirected
, and then use the dir
edge attribute to
set the edge type based on an edge table column. This is typically
done under the /edgestyles
keyword using a discrete
mapping type
(see later). The networktype
setting will affect the calculations if
a network analysis is requested (see later).
A title can be given to the network using the /title
keyword; for SVG
output, this will appear as a mouse pointer tooltip when hovering over
the background. To display text below the graph, use the label
attribute under the /graphattrs
keyword.
A node table and an edge table should be prepared as CSV or TSV files
or Excel worksheets (xls or xlsx files). These are specified under the
/nodetable
and /edgetable
keywords, respectively. You can either
provide the filename directly (such as edgetable: edges.csv
), or
provide other parameters as well under the /nodetable
or
/edgetable
keyword. In the latter case, the file
keyword specifies
the file name, filetype
the file type (optional). For Excel files,
the sheet name can be specified using the sheetname
keyword. For the
edge table, the columns containing the source and target identifiers
should be specified as sourcecolumn
and targetcolumn
(if not
provided then the first two columns will be used). For the node table,
the column containing the node identifier should be specified using
the idcolumn
keyword (if not provided then the first column will be
used). For the node table, the skipisolated
keyword can be used to
omit isolated nodes from the network entirely. In edge tables exported
from Cytoscape, the source and target identifiers are not in separate
columns; set the fromcytoscape
keyword to true
to make the program
handle it correctly.
The node and edge table file names can be overridden by using the -n
and -e
options, respectively. This is useful if you want to use the
same configuration for several different networks.
As Tabnetviz uses the column names as variable names in
expressions for node/edge group definitions and visual style
definitions, all column names will be automatically converted to valid
variable names. This is done by omitting all characters other than
letters, digits, and the "_" character. If a variable name would thus
start with a digit, a c
character will be prepended. Examples of
conversions:
Column name | Variable name |
---|---|
First Name | FirstName |
Weight (kg) | Weightkg |
area (m2) | aream2 |
220V_rating | c220V_rating |
Fraction (%) | Fraction |
Output files are specified with the /outputfiles
keyword.
Tabnetviz always generates a drawing file (an SVG or an image
file) containing the actual visualization; this is specified with the
/outputfiles/drawing
keyword (if omitted then the output will go to
out.svg
). If only a drawing is to be generated, the drawing
keyword can be omitted and the short form can be used, such as
outputfiles: network.svg
. The drawing output file name can be
overridden on the command line using the -o
option.
Other files can optionally be generated
with the following keywords under /outputfiles
:
- dot: A .dot file containing the layout. The dot file can be loaded in a later run to produce another drawing with the same layout.
- nodetableout, edgetableout: the modified node table and edge
table file after adding new columns from network analysis, added
rankings, Boolean columns defining node/edge groups, and added
non-Graphviz properties. Note that the exported files will contain the
converted column names. The tables can be exported as csv, tsv, or
Excel files, decided by the extension of the file name. These file
names can be overridden using the
--nodetableout
and--edgetableout
command line options. - colorbars: An SVG file containing color bars for the colormaps used in the node style and edge style mappings. These can then be used to create a legend for your visualization.
The /layout
keyword can be used to set the graph layout algorithm.
Any algorithm known to Graphviz can be used, i.e. neato
, dot
,
twopi
, circo
, fdp
, sfdp
, osage
, and patchwork
. The default is
neato
. Alternatively, a .dot file name can be specified; this
must contain position information, and the layout will be directly
loaded from it. This file usually comes from an earlier run of the
program.
The /graphattrs
keyword can be used to set attributes for the whole
graph. Any graph attributes known to Graphviz can be used; see the
Graphviz website for a
list of available visual attributes. Tabnetviz will set
outputorder: edgesfirst
and overlap: false
by default for a nicer
visual appearance of the graph. (Note: overlap
must be set to true
if you want to set fixed coordinates for the nodes using the pos
attribute.)
Sometimes you don't want to display the whole network but only some
part of it. Tabnetviz provides an easy way to remove nodes and edges
based on a selection criterion. The /remove
toplevel keyword accepts
the /remove/nodes
and /remove/edges
subkeywords, where Boolean
expressions can be specified to select nodes and edges to remove. The
expressions should use node and edge table column names, respectively.
Simple arithmetic and string operations can be used. Another option
/remove/keepisolatednodes
can be used to specify whether the nodes
that have become isolated after the edge removal should be kept
(default: false
, i.e. they will be removed).
Example:
remove:
nodes: Age < 10
edges: Rate > 5.0
keepisolatednodes: true
The node and edge removal is performed before the network analysis, so
the node and edge properties generated by the network analysis cannot
be used in the Boolean expressions under the /remove
key.
The /networkanalysis
keyword can be used to indicate whether a
network analysis should be performed (false
by default). Graph
theoretical quantity names can be provided, either a single quantity,
a list of quantities, or the keyword all
to calculate all
quantities. The calculated quantities will be added as new columns to
the node table and the edge table, and can then be used to specify
node styles and edge styles. The following quantities can be
calculated for each node for both directed and undirected networks:
AverageShortestPathLength, BetweennessCentrality,
ClosenessCentrality, ClusteringCoefficient, Degree, Connectivity,
Eccentricity, NeighborhoodConnectivity, SelfLoops, Stress. For
edges, EdgeBetweenness can be calculated. For directed networks
only, Indegree and Outdegree can be calculated. For undirected
networks only, Radiality and TopologicalCoefficient can be
calculated. These quantity names match those calculated by the Network
Analyzer plugin of Cytoscape, except for Connectivity (not
calculated by Cytoscape), which is the number of neighbors of a node,
as opposed to Degree which is the number of edges connecting to a
node. Also, Tabnetviz uses the same quantity definitions as the
Network Analyzer plugin of Cytoscape and yields numerically the same
results, except for ClusteringCoefficient in the case of directed
graphs, where Tabnetviz uses the Networkx Python module function which
uses a slightly different definition.
It is recommended that you explicity list the quantities you want to
be calculated rather than specifying all
because calculating all
quantities may take a long time for large networks.
Using the nodetableout
and edgetableout
keywords using command
line options or under the /outputfiles
section in the configuration
file, the modified node and edge tables can be written into new files.
These will contain the parameters calculated by the network analysis
and can be used in other programs or analyses.
Node groups and edge groups can be defined by providing Boolean
expressions of node table or edge table columns, or by explicitly
listing the nodes/edges you want in the group. Visual styles can later
be applied on these groups. Under the /nodegroups
and /edgegroups
keyword, provide groupname: columnexpression
key-value pairs, with
groupname
being the group name of your choice, and
columnexpression
being an expression using node table or edge
table column names. The group name must not match an existing column
name except if it is a Boolean column. The expression can contain
simple numerical or string operations. The program uses the query
method of a Pandas dataframe internally. Python string methods can be
used by appending .str
to the column name. Examples for expressions:
a+b < 5
c == 'YES'
x < 5 and y > 2
d.str.upper() < 'M'
You can also define node/edge groups by providing an explicit
node/edge list instead of a Boolean expression, although this is not
recommended; if you want an explicit group, it's better to add a
Boolean table column defining the group than to list it in the
tabnetviz configuration file. A node list is a comma-separated list of
node names (space after the comma is necessary!) in square brackets:
[node1, node23, node57]
. An edge list is specified by providing a
list of [source, target] pairs, e.g [[node1, node3], [node5, node7], [node8, node13]]
.
Groups are added to the node/edge table as new Boolean columns. As
group definitions are processed in the order they appear in the
configuration file, definitions can refer to groups defined earlier.
For example, if you defined a group group1
, you can define its
complement as not group1
.
Clusters are node groups with a box drawn around them. Currently, only
the dot
and fdp
layout algorithms support drawing clusters.
Clusters can be nested (in that case, the corresponding node groups
must be subsets of each other); overlapping clusters are not allowed.
There are a number of visual attributes defining how to draw clusters;
see the attributes indicated by the letter C in the Graphviz
documentation. Commonly
used cluster attributes include color
, fillcolor
, label
,
labelloc
, fontname
, fontsize
, fontcolor
, style
(= solid
,
dashed
, dotted
, rounded
, filled
, etc.)
Clusters are specified in Tabnetviz using the /clusters
keyword.
This must be followed by a list of the node groups corresponding to
each cluster. If you don't want to specify visual attributes for the
clusters, you can provide a simple list such as:
clusters: [nodegroup1, nodegroup2, nodegroup4]
If you want to specify visual attributes, you can provide the cluster definitions in a dictionary, providing the attributes as key-value pairs, e.g:
clusters:
nodegroup1:
label: First cluster
fontsize: 10
nodegroup2:
style: filled
fillcolor: grey
nodegroup4:
Here, we did not provide attributes for nodegroup4
, but note that
the :
is still needed. Alternatively, a list can also be used, using
the "dash" notation:
clusters:
- nodegroup1:
label: First cluster
fontsize: 10
- nodegroup2:
style: filled
fillcolor: grey
- nodegroup4
See the Configuration File Reference for the full specification.
Optionally, new node/edge table columns containing rankings of
existing node/edge table columns can be added. This is done via the
/addrankings
keyword. See the Configuration File
Reference for details.
Color maps are used to map numerical values to colors, either discrete
colors or a smooth color transition. Tabnetviz can use the
standard color maps defined in
Matplotlib,
but custom color maps can also be defined via the /colormaps
keyword. For example, a smoothly varying color map from blue through
white to yellow (we name it "bwy") can be defined as this:
colormaps:
bwy:
type: continuous
map:
0.0: '#0066CCC0'
0.5: '#FFFFFFC0'
1.0: '#FFFF00C0'
A discrete color map is specified by listing the colors you want:
colormaps:
mycolors:
type: discrete
map:
- green
- yellow
- orange
- '#fabcee'
In a continuous color map, the colors will smoothly transition into each other, while there will be no transition in a discrete color map. Typically, you would use a continuous color map to map floating point values and a discrete color map to map integer values.
Colors can be specified by color names, RGB, RGBA or HSV formats; see the Graphviz documentation for a description of available formats. See the Configuration File Reference for the full description of how to define custom color maps. The color maps can then be used for the mappings of node/edge properties to visual attributes.
When using color maps in your visualization, you will probably want to
add a legend to your figure displaying a color bar with the minimum
and maximum values shown. The /outputfiles/colorbars
keyword can be
used for this: provide an SVG file name, and Tabnetviz will write a
separate svg file with this name, containing colorbars for the
colormaps used in your mappings.
Any node/edge table column or an expression using columns can be used
to define a mapping to visual attributes. The /nodestyles
keyword is
used to define node attribute mappings, and the /edgestyles
keyword
to define edge attribute mappings. A mapping can be applied to all
nodes or edges (using the /nodestyles/default
and
/edgestyles/default
keywords, respectively), or a node group or edge
group that has previously been defined via the /nodegroups
or
/edgegroups
keyword. Tabnetviz applies the mappings in the order
they appear in the configuration file. Thus, if you define two
overlapping node groups and specify a mapping for each one, the second
mapping will overwrite the first mapping for the nodes in the
intersection of the two node groups. However, the default
mapping
will always be applied first, and will be overwritten by any group
mappings.
The general structure of node style mappings is like this:
nodestyles:
default:
attributename1:
...
attributename2:
...
...
nodegroup1:
attributenameN:
...
attributenameM:
...
...
nodegroup2:
...
Thus, one first defines the styles for all nodes (default
group) and
then styles for any node group defined earlier (nodegroup1,
nodegroup2, etc.). For each group, the attributes to map are named,
and the mapping is defined as described below. The edge styles are
defined similarly.
Any node/edge attribute known to Graphviz can be used; please refer to the Graphviz website for a detailed description of the available attributes. Here, we summarize the most commonly used attributes.
Attribute description | Attribute name | Attribute values, remark |
---|---|---|
Border color | color |
color specification |
Border line width | penwidth |
width in points |
Border line type | style |
solid,dashed,dotted,bold |
Fill color | fillcolor |
color; style must be filled |
Label | label |
string (default is node name) |
Tooltip | tooltip |
string |
Shape | shape |
box,circle,ellipse ... (many shapes available) |
Node style | style |
solid,filled,striped,wedged ... |
Label font name | fontname |
a font name |
Label font size | fontsize |
in points |
Label font color | fontcolor |
color specification |
Node image | image |
path to image file |
Node height | height |
in inches (1 inch = 72 points) |
Node width | width |
in inches |
Attribute description | Attribute name | Attribute values, remark |
---|---|---|
Line color | color |
color specification |
Line type | style |
solid,dashed,dotted,bold |
Line width | penwidth |
in points |
Directedness | dir |
forward,none,both |
Arrow shape | arrowhead |
normal,empty,tee,vee ... (many shapes available) |
Label | label |
string |
Label font name | fontname |
font name |
Label font size | fontsize |
in points |
Label font color | fontcolor |
color specification |
Head label | headlabel |
string |
Tail label | taillabel |
string |
Head/tail label font name | labelfontname |
font name |
Head/tail label font size | labelfontsize |
in points |
Head/tail label font color | labelfontcolor |
color specification |
Custom non-Graphviz attributes can also be used; the names of these
must start with the ng
prefix to indicate to Tabnetviz that it
is a non-Graphviz attribute. Thus, Tabnetviz will throw an error
message if the user tries to define a non-Graphviz attribute not
starting with ng
. Non-Graphviz attributes will be added as new
columns to the node table or edge table, and can then be used to
define mappings to Graphviz attributes. See the combine
mapping type
for examples of how to use this feature.
Tabnetviz knows the following mapping types:
- Constant value: this is in fact not a mapping; every node/edge in the given group will get the same constant value. See an example later in this document.
The actual mappings map node/edge table column data to visual node/edge attributes. These mapping types are:
direct
: Attribute value is directly taken from the table. The value can be used as it is, or transformed through a specified expression (this allows you to specify an arbitrary mapping).discrete
: Discrete values in the table will be mapped to the specified attribute values.linear
: A linear transformation of table values to the specified range.cont2disc
: Continuous-to-discrete mapping. Ranges of a continuous parameter taken from the table are mapped to the specified discrete attribute values.colormap
: Table values (floating point or integer) are mapped to colors, either discrete or continuous.combine
: Combine multiple non-Graphviz attributes into a multi-valued attribute such as position or color list. The result is formatted using a specified format string.
The mapping type is specified using the type
keyword under the given
attribute name, i.e. for node styles:
/nodestyles/
groupname/
attributename/type
.
The direct
, discrete
, linear
, cont2disc
, and colormap
mapping types use a colexpr
parameter. This defines which table
values should be used for the mapping. It can either be a single table
column name, or an expression on table column names, which can be a
simple numerical expression or a string expression. Examples for
column expressions:
a
abs(a)+b/c
a.str.upper()+b
Internally, this uses the Pandas eval
dataframe method with the
Python engine.
This also means that you could define a linear mapping by yourself
using an expression, and use the direct
mapping type instead of the
linear
mapping type. However, the linear
mapping type is easier to
use as it is enough to specify the minimum and maximum values of the
range to map to.
We provide examples for each mapping type below. Please see the
Configuration File Reference document for a full
description of the available parameters for each mapping type. We will
use the default
group for each mapping, but these could work with
any node/edge group as well. Also, we only show node style mappings;
edge style mappings work the same way.
This is not a mapping as we do not use data from the node table, just set a constant value. To set the fill color of all nodes to green:
nodestyles:
default:
fillcolor: green
Set node height from sum of columns A and B:
nodestyles:
default:
height:
type: direct
colexpr: A+B
Set node fillcolor to green, red, or yellow based on the value of the "Fruit" column in the node table:
nodestyles:
default:
fillcolor:
type: discrete
colexpr: Fruit
map:
avocado: green
tomato: red
lemon: yellow
Set node height from percentage, mapping the 0-100 range to 4-8
nodestyles:
default:
height:
type: linear
colexpr: Percentage
colmin: 0
colmax: 100
mapmin: 4
mapmax: 8
Set node shape to circle, triangle, or box depending on "Age" value:
nodestyles:
default:
shape:
type: cont2disc
colexpr: Age
map:
12.0: circle
20.0: triangle
higher: box
Set node fillcolor on a continuous red-white-blue color scale based on
a column named "Percentage". This a reversed "bwr" scale from
Matplotlib (we could also use bwr_r
).
nodestyles:
default:
fillcolor:
type: colormap
colexpr: Percentage
colmin: 0.0
colmax: 100.0
colormap: bwr
reverse: yes
Set node coordinates from "x" and "y" columns in the table. We
combine them to the pos
attribute by joining them with a comma:
graphattrs:
overlap: True
nodestyles:
default:
pos:
type: combine
attrlist: [x, y]
formatstring: '%f,%f!'
The exclamation mark at the end of the format string is to force the
layout algorithm to keep the position fixed. For this to work, the
overlap
graph attribute must be set to True
as shown.
Often, the attributes to combine are not in the table but need to be
created first by using a mapping. In this case, non-Graphviz
attributes starting with ng
should be used, e.g. if you want to
multiply table values with 10 before combining:
nodestyles:
default:
ngx:
type: direct
colexpr: 10*x
ngy:
type: direct
colexpr: 10*x
pos:
type: combine
attrlist: [ngx, ngy]
formatstring: '%f,%f!'
The combine
mapping can also be used to define color lists. For
example, we can use the wedged
node style to draw nodes divided into
two parts with different fill colors. The two colors may be mapped
from two node table columns value1
and value2
. We define two
non-Graphviz colors (ngcolor1
, ngcolor2
) and combine them into a
colorList with two elements (separated with a colon):
nodestyles:
default:
shape: circle
style: wedged
ngcolor1:
type: colormap
colexpr: value1
colormap: bwr
ngcolor2:
type: colormap
colexpr: value2
colormap: bwr
fillcolor:
type: combine
attrlist: [ngcolor1, ngcolor2]
formatstring: '%s:%s'
A more complex example can be found in the demo file.