@@ -987,10 +987,8 @@ details of how to use the quintuply linked structure in the C API.
987
987
988
988
.. _sec_data_model_tree_roots :
989
989
990
- Accessing roots
991
- ===============
992
-
993
- .. todo :: Update this with a discussion of the virtual root
990
+ Roots
991
+ =====
994
992
995
993
The roots of a tree are defined as the unique endpoints of upward paths
996
994
starting from sample nodes (if no path leads upward from a sample node,
@@ -1003,6 +1001,10 @@ example, we get a tree with two roots:
1003
1001
:width: 200px
1004
1002
:alt: An example tree with multiple roots
1005
1003
1004
+ We keep track of roots in tskit by using a special additional node
1005
+ called the **virtual root **, whose children are the roots. In the
1006
+ quintuply linked tree encoding this is an extra element at the end
1007
+ of each of the tree arrays, as shown here:
1006
1008
1007
1009
=========== =========== =========== =========== =========== ===========
1008
1010
node parent left_child right_child left_sib right_sib
@@ -1013,17 +1015,37 @@ node parent left_child right_child left_sib right_sib
1013
1015
3 6 -1 -1 -1 4
1014
1016
4 6 -1 -1 3 -1
1015
1017
5 7 0 2 -1 -1
1016
- 6 -1 3 4 7 -1
1017
- 7 -1 5 5 -1 6
1018
+ 6 -1 3 4 -1 7
1019
+ 7 -1 5 5 6 -1
1020
+ **8 ** **-1 ** **6 ** **7 ** **-1 ** **-1 **
1018
1021
=========== =========== =========== =========== =========== ===========
1019
1022
1020
- To gain efficient access to the roots in the quintuply linked encoding we keep
1021
- one extra piece of information: the ``left_root ``. In this example
1022
- the leftmost root is ``7 ``. Roots are considered siblings, and so
1023
- once we have one root we can find all the other roots efficiently using
1024
- the ``left_sib `` and ``right_sib `` arrays. For example, we can see here
1025
- that the right sibling of ``7 `` is ``6 ``, and the left sibling of ``6 ``
1026
- is ``7 ``.
1023
+ In this example, node 8 is the virtual root; its left child is 6
1024
+ and its right child is 7.
1025
+ Importantly, though, this is an asymmetric
1026
+ relationship, since the parent of the "real" roots 6 and 7 is null
1027
+ (-1) and *not * the virtual root. To emphasise that this is not a "real"
1028
+ node, we've shown the values for the virtual root here in bold.
1029
+
1030
+ The main function of the virtual root is to efficiently keep track of
1031
+ tree roots in the internal library algorithms, and is usually not
1032
+ something we need to think about unless working directly with
1033
+ the quintuply linked tree structure. However, the virtual root can be
1034
+ useful in some algorithms and so it can optionally be returned in traversal
1035
+ orders (see :meth: `.Tree.nodes `). The virtual root has the following
1036
+ properties:
1037
+
1038
+ - Its ID is always equal to the number of nodes in the tree sequence (i.e.,
1039
+ the length of the node table). However, there is **no corresponding row **
1040
+ in the node table, and any attempts to access information about the
1041
+ virtual root via either the tree sequence or tables APIs will fail with
1042
+ an out-of-bounds error.
1043
+ - The parent and siblings of the virtual root are null.
1044
+ - The time of the virtual root is defined as positive infinity (if
1045
+ accessed via :meth: `.Tree.time `). This is useful in defining the
1046
+ time-based node traversal orderings.
1047
+ - The virtual root is the parent of no other node---roots do **not **
1048
+ have parent pointers to the virtual root.
1027
1049
1028
1050
1029
1051
.. _sec_data_model_missing_data :
0 commit comments