Skip to content

Commit a2faa9a

Browse files
Document orderings
1 parent f7988b0 commit a2faa9a

File tree

9 files changed

+237
-185
lines changed

9 files changed

+237
-185
lines changed

c/CHANGELOG.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
- The previously deprecated option ``TSK_SAMPLE_COUNTS`` has been removed. (:user:`benjeffery`, :issue:`1744`, :pr:`1761`).
1616

17+
- FIXME breaking changes for tree API and virtual root
1718

1819
**Features**
1920

@@ -37,6 +38,8 @@
3738
tree sequence. This is then used to generate an error if ``time_units`` is ``uncalibrated`` when
3839
using the branch lengths in statistics. (:user:`benjeffery`, :issue:`1644`, :pr:`1760`)
3940

41+
- FIXME add features for virtual root, num_edges, stack allocation size etc
42+
4043
**Fixes**
4144

4245
----------------------

docs/_static/different_time_samples.svg

Lines changed: 27 additions & 33 deletions
Loading

docs/_static/tree_structure1.svg

Lines changed: 42 additions & 43 deletions
Loading

docs/_static/tree_structure2.svg

Lines changed: 40 additions & 40 deletions
Loading

docs/data-model.rst

Lines changed: 35 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -987,10 +987,8 @@ details of how to use the quintuply linked structure in the C API.
987987

988988
.. _sec_data_model_tree_roots:
989989

990-
Accessing roots
991-
===============
992-
993-
.. todo:: Update this with a discussion of the virtual root
990+
Roots
991+
=====
994992

995993
The roots of a tree are defined as the unique endpoints of upward paths
996994
starting from sample nodes (if no path leads upward from a sample node,
@@ -1003,6 +1001,10 @@ example, we get a tree with two roots:
10031001
:width: 200px
10041002
:alt: An example tree with multiple roots
10051003

1004+
We keep track of roots in tskit by using a special additional node
1005+
called the **virtual root**, whose children are the roots. In the
1006+
quintuply linked tree encoding this is an extra element at the end
1007+
of each of the tree arrays, as shown here:
10061008

10071009
=========== =========== =========== =========== =========== ===========
10081010
node parent left_child right_child left_sib right_sib
@@ -1013,17 +1015,37 @@ node parent left_child right_child left_sib right_sib
10131015
3 6 -1 -1 -1 4
10141016
4 6 -1 -1 3 -1
10151017
5 7 0 2 -1 -1
1016-
6 -1 3 4 7 -1
1017-
7 -1 5 5 -1 6
1018+
6 -1 3 4 -1 7
1019+
7 -1 5 5 6 -1
1020+
**8** **-1** **6** **7** **-1** **-1**
10181021
=========== =========== =========== =========== =========== ===========
10191022

1020-
To gain efficient access to the roots in the quintuply linked encoding we keep
1021-
one extra piece of information: the ``left_root``. In this example
1022-
the leftmost root is ``7``. Roots are considered siblings, and so
1023-
once we have one root we can find all the other roots efficiently using
1024-
the ``left_sib`` and ``right_sib`` arrays. For example, we can see here
1025-
that the right sibling of ``7`` is ``6``, and the left sibling of ``6``
1026-
is ``7``.
1023+
In this example, node 8 is the virtual root; its left child is 6
1024+
and its right child is 7.
1025+
Importantly, though, this is an asymmetric
1026+
relationship, since the parent of the "real" roots 6 and 7 is null
1027+
(-1) and *not* the virtual root. To emphasise that this is not a "real"
1028+
node, we've shown the values for the virtual root here in bold.
1029+
1030+
The main function of the virtual root is to efficiently keep track of
1031+
tree roots in the internal library algorithms, and is usually not
1032+
something we need to think about unless working directly with
1033+
the quintuply linked tree structure. However, the virtual root can be
1034+
useful in some algorithms and so it can optionally be returned in traversal
1035+
orders (see :meth:`.Tree.nodes`). The virtual root has the following
1036+
properties:
1037+
1038+
- Its ID is always equal to the number of nodes in the tree sequence (i.e.,
1039+
the length of the node table). However, there is **no corresponding row**
1040+
in the node table, and any attempts to access information about the
1041+
virtual root via either the tree sequence or tables APIs will fail with
1042+
an out-of-bounds error.
1043+
- The parent and siblings of the virtual root are null.
1044+
- The time of the virtual root is defined as positive infinity (if
1045+
accessed via :meth:`.Tree.time`). This is useful in defining the
1046+
time-based node traversal orderings.
1047+
- The virtual root is the parent of no other node---roots do **not**
1048+
have parent pointers to the virtual root.
10271049

10281050

10291051
.. _sec_data_model_missing_data:

docs/examples.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,7 @@ def stats():
257257

258258

259259
def tree_structure():
260-
def write_table(tree):
260+
def write_table(tree, show_virtual_root=False):
261261
fmt = "{:<12}"
262262
heading = [
263263
"node",
@@ -273,7 +273,7 @@ def write_table(tree):
273273
print(line)
274274
print(col_def)
275275

276-
for u in range(ts.num_nodes):
276+
for u in range(ts.num_nodes + int(show_virtual_root)):
277277
line = "".join(
278278
fmt.format(v)
279279
for v in [
@@ -325,7 +325,7 @@ def write_table(tree):
325325
)
326326
tree = ts.first()
327327

328-
write_table(tree)
328+
write_table(tree, show_virtual_root=True)
329329
print(tree.draw_text())
330330
tree.draw_svg("_static/tree_structure2.svg", time_scale="rank")
331331

@@ -404,6 +404,6 @@ def finding_nearest_neighbors():
404404
# allele_frequency_spectra()
405405
# missing_data()
406406
# stats()
407-
# tree_structure()
407+
tree_structure()
408408
tree_traversal()
409409
finding_nearest_neighbors()

docs/python-api.rst

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@
66
compare arrays representing different trees along the sequence, you must
77
take **copies** of the arrays.
88

9+
.. |virtual_root_array_note| replace:: The length of these arrays is
10+
equal to the number of nodes in the tree sequence plus 1, with the
11+
final element corresponding to the tree's :meth:`~.Tree.virtual_root`.
12+
Please see the :ref:`tree roots <sec_data_model_tree_roots>` section
13+
for more details.
14+
915
.. currentmodule:: tskit
1016
.. _sec_python_api:
1117

0 commit comments

Comments
 (0)