Skip to content

Releases: tskit-dev/tsinfer

0.4.1

19 Mar 13:27
Compare
Choose a tag to compare

[0.4.1] - 2024-04-19

Changes

  • Optional arrays to VariantData (e.g. individuals_time) are now configured by kwargs to VariantData.init rather than requiring a specifically named array in the vcf zarr store. (#1011, #1006, @benjeffery)

0.4.0

06 Mar 15:01
Compare
Choose a tag to compare

[0.4.0] - 2024-04-06

Changelog is relative to the last full release, 0.3.3.

Breaking Changes

  • tsinfer 0.4.0 infers data from on-disk or in-memory vcf-zarr datasets, allowing users to leverage optimized
    and parallel VCF parsing via the bio2zarr package. The SampleData file format and class are now deprecated.
  • If a mismatch ratio is provided to the infer command, it only applies during the
    match_samples phase (#980, #981, @hyanwong)

Features

  • Add batch ancestor and sample matching APIs for splitting work across many independent jobs.
    (#954, #917, @benjeffery)

Performance improvements

  • Reduce memory usage when running match_samples against large cohorts
    containing sequences with substantial amounts of error.
    (#761, @jeromekelleher)
  • truncate_ancestors no longer requires loading all the ancestors into RAM.
    (#811, @benjeffery)
  • Increase parallelisation of match_ancestors by generating parallel groups from
    their implied dependency graph. (#828, #147, @benjeffery)
  • Reduce memory requirements of the generate_ancestors function by providing
    the genotype_encoding (#809) and mmap_temp_dir (#808) options
    (@jeromekelleher).

Other Breaking Changes

  • Removed the uuid field from SampleData; equality is now purely based on data
  • If a mismatch ratio is provided to the infer command, it only applies during the match_samples phase
  • A permissive JSON schema is now set on node table metadata

Fixes

  • Properly account for "N" as an unknown ancestral state, and ban "" from being
    set as an ancestral state (#963, @hyanwong)

0.4.0a2

06 Sep 23:07
1d04fb8
Compare
Choose a tag to compare
0.4.0a2 Pre-release
Pre-release

[0.4.0a2] - 2024-09-06

2nd Alpha release of tsinfer 0.4.0

Features

  • Add batch ancestor and sample matching APIs for splitting work across many independent jobs.
    (#954, #917, @benjeffery)

0.4.0a1

27 Jul 00:53
Compare
Choose a tag to compare
0.4.0a1 Pre-release
Pre-release

##Alpha release of tsinfer 0.4.0

Features

  • tsinfer now supports inferring data from an vcf-zarr dataset. This allows users
    to infer from VCFs via the optimised and parallel VCF parsing in bio2zarr.
  • The VariantData class can be used to load the vcf-data and be used for inference.
  • vcf-zarr sample_ids are inserted into individual metadata as variant_data_sample_id
    if this key does not already exist.

Breaking Changes

  • Remove the uuid field from SampleData. SampleData equality is now purely based
    on data. ({pr}748, {user}benjeffery)

Performance improvements

  • Reduce memory usage when running match_samples against large cohorts
    containing sequences with substantial amounts of error.
    ({pr}761, {user}jeromekelleher)

  • truncate_ancestors no longer requires loading all the ancestors into RAM.
    ({pr}811, {user}benjeffery)

  • Reduce memory requirements of the generate_ancestors function by providing
    the genotype_encoding ({pr}809) and mmap_temp_dir ({pr}808) options
    ({user}jeromekelleher).

  • Increase parallelisation of match_ancestors by generating parallel groups from
    their implied dependency graph. ({pr}828, {issue}147, {user}benjeffery)

0.3.3

17 Jul 10:48
Compare
Choose a tag to compare

Fixes

  • Bug fix release for numpy 2 (#937).

Breaking Changes

0.3.2

16 Jul 09:40
Compare
Choose a tag to compare

tsinfer now supports numpy2 (and 1.XX) and python3.12.
Python 3.8 support is removed.

0.3.1 - Packaging bugfix release

19 Apr 14:29
Compare
Choose a tag to compare

Fixes bad dependency specification.

0.3.0 - Bugfix and maintenance release

25 Oct 22:14
1290344
Compare
Choose a tag to compare

Read https://tskit.dev/news/20221025-tsinfer-0.3.0.html for a more detailed explanation of this update.

Features

  • When calling sample_data.add_site() the ancestral state does not need to be the first allele (index 0): alternatively, an ancestral allele index can be given (and if MISSING_DATA, the ancestral state will be imputed). (#718, #686 @hyanwong)

  • The CLI interface now allows recombination rate (or rate maps) and mismatch ratios to be specified (#731, #435 @hyanwong)

  • The calls to match-ancestors and match-samples via the CLI are now logged in the provenance entries of the output tree sequence (#732 and 741, #730 @hyanwong)

  • The CLI interface allows --no-post-process to be specified (for details of post- processing, see “Breaking changes” below) (#727, #721 @hyanwong)

  • matching routines warn if no inference sites (#685, #683 @hyanwong)

Fixes

  • sample_data.subset() now accepts a sequence_length (#681, @hyanwong)

  • verify no longer raises error when comparing a genotype to missingness. (#716, #625, @benjeffery)

Breaking changes:

  • The simplify parameter is now deprecated in favour of post_process, which prior to simplification, removes the “virtual-root-like” ancestor (inserted by tsinfer to aid the matching process) then splits the ultimate ancestor into separate pieces. If splitting is not required, the post_process step can also be called as a separate function with the parameter split_ultimate=False (#687, #750, #673, @hyanwong)

  • Post-processing by default erases tree topology that exists before the first site and one unit after the last site, to avoid extrapolating into regions with no data. This can be disabled by calling post_process step as a separate function with the parameter erase_flanks=False (#720, #483, @hyanwong)

  • Inference now sets time_units on both ancestor and final tree sequences to tskit.TIME_UNITS_UNCALIBRATED, stopping accidental use of branch length calculations on the ts. (#680, @hyanwong)

0.2.3

08 Apr 11:19
69056f6
Compare
Choose a tag to compare

Features

  • Added ancestor(id) to AncestorData class.
    (:pr:570, :issue:569, :user:hyanwong)

Fixes

  • Mark zarr 2.11.0, 2.11.1 and 2.11.2 as incompatible due to zarr-python
    bugs #965 and #967.
    (:issue:643, :pr:657, :user:benjeffery)

0.2.2

23 Feb 16:31
99cad13
Compare
Choose a tag to compare

Bugfixes:

  • Mutations at non-inference sites are now guaranteed to be fully parsimonious.
    Previous versions required a mutation above the root when the input ancestral state
    disagreed with the ancestral state produced by the parsimony algorithm. Now fixed by
    using the new map_mutations code from tskit 0.3.7 (:pr:557, :user:hyanwong)

New Features:

Breaking changes:

  • Oldest nodes in a standard inferred tree sequence are no longer set to frequencies ~2
    and ~3 (i.e. 2 or 3 times as old as all the other nodes), but are spaced above the
    others by the mean time between unique ancestor ages (:pr:485, :user:hyanwong)

  • The tsinfer.SampleData.from_tree_sequence() function now defaults to setting
    use_sites_time and use_individuals_time to False rather than True
    (:pr:599, :user:hyanwong)