Don't load all ancestors when truncating #811

benjeffery · 2023-03-21T12:58:21Z

No description provided.

hyanwong

Nice, simple. LGTM. I don't know if there are other params to tsinfer.AncestorData(...) other than the sequence_length and chunk_size... arguments, but I presume not. E.g. at some point I wanted to associated a time_units value with the ancestors file, but that never got into the code base.

codecov · 2023-03-21T13:18:15Z

Codecov Report

Merging #811 (62cfa3d) into main (96623e0) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #811      +/-   ##
==========================================
- Coverage   93.34%   93.34%   -0.01%     
==========================================
  Files          17       17              
  Lines        5652     5662      +10     
  Branches     1014     1016       +2     
==========================================
+ Hits         5276     5285       +9     
  Misses        247      247              
- Partials      129      130       +1

Flag	Coverage Δ
C	`93.34% <100.00%> (-0.01%)`	⬇️
python	`96.29% <100.00%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
tsinfer/formats.py	`97.52% <100.00%> (+0.01%)`	⬆️

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

jeromekelleher

Looks like an improvement to me 👍

benjeffery · 2023-03-22T01:41:13Z

This was way too slow at around 8hours for the smallest chrom in my dataset. Have just pushed a change which I'm now testing where only changed ancestors are touched.

benjeffery · 2023-03-22T17:37:01Z

@Mergifyio rebase

mergify · 2023-03-22T17:37:10Z

rebase

✅ Branch has been successfully rebased

benjeffery · 2023-03-27T15:52:25Z

Fixed up, should be good to merge.

jeromekelleher

LGTM, a few suggestions. Happy to merge when addressed so pre-approving

jeromekelleher · 2023-03-28T09:12:29Z

tests/test_inference.py

+            tsinfer.generate_ancestors(sample_data, path=d + "ancestors.tsi")
+            ancestors = tsinfer.AncestorData.load(d + "ancestors.tsi")
+            time = np.sort(ancestors.ancestors_time[:])
+            if (


Maybe put the comment on a different line? Seems like unnecessary line breakage here

jeromekelleher · 2023-03-28T09:14:35Z

tests/test_inference.py

+            else:
+                params = [(0.4, 0.6, 1), (0, 1, 10)]
+            for param in params:
+                truncated_ancestors = ancestors.truncate_ancestors(


will ancestors.truncate_ancestors(*param, buffer_length=2) work here ?

jeromekelleher · 2023-03-28T09:15:44Z

tsinfer/formats.py

-        for anc in self.ancestors():
+        truncated = self.copy(**kwargs)
+
+        # Create a buffer of 1000 ancestors with their indexes


Suggested change

# Create a buffer of 1000 ancestors with their indexes

# Create a buffer of buffer_length ancestors with their indexes

jeromekelleher · 2023-03-28T09:20:32Z

tsinfer/formats.py

-        truncated.ancestors_full_haplotype[:] = haplotypes
-        truncated.ancestors_full_haplotype_mask[:] = haplotypes == tskit.MISSING_DATA
+                    buffer_pos += 1
+                    if buffer_pos == buffer_length:


A local function would help here,

def flush_buffer(length): truncated.ancestors_start.set_orthogonal_selection( index_buffer[:length], start_buffer[:length] ) # etc

I don't follow how this is working, so a few comments on how flush_buffer works and would be helpful here.

benjeffery · 2023-03-30T12:50:16Z

Fixed up, merging.

hyanwong approved these changes Mar 21, 2023

View reviewed changes

jeromekelleher approved these changes Mar 21, 2023

View reviewed changes

benjeffery force-pushed the iterate-truncate branch 2 times, most recently from dba0964 to 4a87fa2 Compare March 22, 2023 17:27

benjeffery marked this pull request as ready for review March 22, 2023 17:27

benjeffery force-pushed the iterate-truncate branch 2 times, most recently from 68e9b49 to 38fb8c9 Compare March 27, 2023 15:50

benjeffery force-pushed the iterate-truncate branch from 38fb8c9 to c52bbec Compare March 27, 2023 15:53

jeromekelleher approved these changes Mar 28, 2023

View reviewed changes

benjeffery force-pushed the iterate-truncate branch from c52bbec to 77e52ce Compare March 30, 2023 12:49

benjeffery added the AUTOMERGE-REQUESTED label Mar 30, 2023

Ben Jeffery and others added 2 commits March 30, 2023 13:05

Don't load all ancestors when truncating

30e6181

Only update changed ancestors

62cfa3d

benjeffery force-pushed the iterate-truncate branch from 77e52ce to 62cfa3d Compare March 30, 2023 13:05

mergify bot merged commit 18b51ae into tskit-dev:main Mar 30, 2023

mergify bot removed the AUTOMERGE-REQUESTED label Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Don't load all ancestors when truncating #811

Don't load all ancestors when truncating #811

Uh oh!

benjeffery commented Mar 21, 2023

Uh oh!

hyanwong left a comment •

edited

Loading

Uh oh!

codecov bot commented Mar 21, 2023 •

edited

Loading

Uh oh!

jeromekelleher left a comment

Uh oh!

benjeffery commented Mar 22, 2023

Uh oh!

benjeffery commented Mar 22, 2023

Uh oh!

mergify bot commented Mar 22, 2023

Uh oh!

benjeffery commented Mar 27, 2023

Uh oh!

jeromekelleher left a comment

Uh oh!

jeromekelleher Mar 28, 2023

Uh oh!

jeromekelleher Mar 28, 2023

Uh oh!

jeromekelleher Mar 28, 2023

Uh oh!

jeromekelleher Mar 28, 2023

Uh oh!

jeromekelleher Mar 28, 2023

Uh oh!

benjeffery commented Mar 30, 2023

Uh oh!

Uh oh!

	# Create a buffer of 1000 ancestors with their indexes
	# Create a buffer of buffer_length ancestors with their indexes

Don't load all ancestors when truncating #811

Don't load all ancestors when truncating #811

Uh oh!

Conversation

benjeffery commented Mar 21, 2023

Uh oh!

hyanwong left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

benjeffery commented Mar 22, 2023

Uh oh!

benjeffery commented Mar 22, 2023

Uh oh!

mergify bot commented Mar 22, 2023

✅ Branch has been successfully rebased

Uh oh!

benjeffery commented Mar 27, 2023

Uh oh!

jeromekelleher left a comment

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Mar 28, 2023

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Mar 28, 2023

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Mar 28, 2023

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Mar 28, 2023

Choose a reason for hiding this comment

Uh oh!

jeromekelleher Mar 28, 2023

Choose a reason for hiding this comment

Uh oh!

benjeffery commented Mar 30, 2023

Uh oh!

Uh oh!

hyanwong left a comment •

edited

Loading

codecov bot commented Mar 21, 2023 •

edited

Loading