-
Notifications
You must be signed in to change notification settings - Fork 15
Don't load all ancestors when truncating #811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, simple. LGTM. I don't know if there are other params to tsinfer.AncestorData(...)
other than the sequence_length
and chunk_size...
arguments, but I presume not. E.g. at some point I wanted to associated a time_units
value with the ancestors file, but that never got into the code base.
Codecov Report
@@ Coverage Diff @@
## main #811 +/- ##
==========================================
- Coverage 93.34% 93.34% -0.01%
==========================================
Files 17 17
Lines 5652 5662 +10
Branches 1014 1016 +2
==========================================
+ Hits 5276 5285 +9
Misses 247 247
- Partials 129 130 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an improvement to me 👍
This was way too slow at around 8hours for the smallest chrom in my dataset. Have just pushed a change which I'm now testing where only changed ancestors are touched. |
dba0964
to
4a87fa2
Compare
@Mergifyio rebase |
✅ Branch has been successfully rebased |
68e9b49
to
38fb8c9
Compare
Fixed up, should be good to merge. |
38fb8c9
to
c52bbec
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few suggestions. Happy to merge when addressed so pre-approving
tests/test_inference.py
Outdated
tsinfer.generate_ancestors(sample_data, path=d + "ancestors.tsi") | ||
ancestors = tsinfer.AncestorData.load(d + "ancestors.tsi") | ||
time = np.sort(ancestors.ancestors_time[:]) | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put the comment on a different line? Seems like unnecessary line breakage here
else: | ||
params = [(0.4, 0.6, 1), (0, 1, 10)] | ||
for param in params: | ||
truncated_ancestors = ancestors.truncate_ancestors( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will ancestors.truncate_ancestors(*param, buffer_length=2)
work here ?
tsinfer/formats.py
Outdated
for anc in self.ancestors(): | ||
truncated = self.copy(**kwargs) | ||
|
||
# Create a buffer of 1000 ancestors with their indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Create a buffer of 1000 ancestors with their indexes | |
# Create a buffer of buffer_length ancestors with their indexes |
truncated.ancestors_full_haplotype[:] = haplotypes | ||
truncated.ancestors_full_haplotype_mask[:] = haplotypes == tskit.MISSING_DATA | ||
buffer_pos += 1 | ||
if buffer_pos == buffer_length: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A local function would help here,
def flush_buffer(length):
truncated.ancestors_start.set_orthogonal_selection(
index_buffer[:length], start_buffer[:length]
)
# etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't follow how this is working, so a few comments on how flush_buffer
works and would be helpful here.
c52bbec
to
77e52ce
Compare
Fixed up, merging. |
77e52ce
to
62cfa3d
Compare
No description provided.