Skip to content

Commit bd5e6a1

Browse files
committed
Add a short data-model section pointing to the file format
1 parent 0427239 commit bd5e6a1

File tree

2 files changed

+24
-5
lines changed

2 files changed

+24
-5
lines changed

docs/data-model.md

+21-4
Original file line numberDiff line numberDiff line change
@@ -387,10 +387,10 @@ The tree sequence itself also has metadata stored as a byte array.
387387
### Valid tree sequence requirements
388388

389389
Arbitrary data can be stored in tables using the classes in the
390-
{ref}`sec_tables_api`. However, only a {class}`TableCollection`
391-
that fulfils a set of requirements represents
392-
a valid {class}`TreeSequence` object which can be obtained
393-
using the {meth}`TableCollection.tree_sequence` method. In this
390+
{ref}`sec_tables_api`. The {meth}`TableCollection.tree_sequence` method
391+
can be used to turn such a {class}`TableCollection` into an immutable
392+
{class}`TreeSequence` object, but this requires the tables to
393+
fulfil a specific set of requirements. In this
394394
section we list these requirements, and explain their rationale.
395395
Violations of most of these requirements are detected when the
396396
user attempts to load a tree sequence via {func}`tskit.load` or
@@ -598,6 +598,23 @@ can be used to create an index on a table collection if necessary.
598598
Add more details on what the indexes actually are.
599599
:::
600600

601+
602+
(sec_data_model_saving)=
603+
604+
### Saving to file
605+
606+
When serializing (e.g. storing a {class}`TreeSequence` to disk using
607+
{meth}`dump<TreeSequence.dump>`), the underlying tables are stored along with the
608+
indexes, top-level metadata, attributes such as the sequence length and time units, and
609+
the {ref}`sec_data_model_reference_sequence` if it exists. {func}`Loading <load>` such a
610+
file returns an immutable tree sequence object, with pre-calculated indexes immediately
611+
available. See the {ref}`sec_tree_sequence_file_format` section for more details.
612+
613+
Although data in a raw {class}`TableCollection` need not conform to the
614+
{ref}`sec_valid_tree_sequence_requirements`, it too can be
615+
{meth}`dumped <TableCollection.dump>` to a file (with indexes stored if they exist).
616+
617+
601618
(sec_data_model_data_encoding)=
602619

603620
## Data encoding

docs/file-formats.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,9 @@ files. We also refer to them as "tree sequence files".
3535

3636
:::{todo}
3737
Link to the documentation for kastore, and describe the arrays that are
38-
stored as well as the top-level metadata.
38+
stored as well as the top-level metadata. Note that a structured listing of
39+
all the data stored in a tree sequence file can be shown using
40+
e.g. ``python -m kastore ls file.trees``.
3941
:::
4042

4143

0 commit comments

Comments
 (0)