-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Hi,
This bug is similar to what was reported in #2986, where we discussed a similar issue that occurs with pairwise coalescent rate calculations. I have to say, that I am using the 0.6.0-dev version of tskit.
Here is my problem:
When trying to calculate the overall pairwise coalescent rates using the time window [0, float('inf')] and genomic windows, I get a similar bug, which we had before already in #2986:
Bug detected in lib/tskit/trees.c at line 9801. If you are using tskit directly please open an issue on GitHub, ideally with a reproducible example.
Here is the non-working code that produces the bug using this time window [0, float('inf')]
:
import tskit
import numpy as np
import itertools
def calculate_overall_coalescent_rates(tree_seq, populations):
sequence_length = tree_seq.sequence_length
window_breaks = np.append(np.arange(0, sequence_length, 80000), sequence_length)
overall_time_window = np.array([0, float('inf')])
indexes = list(itertools.combinations_with_replacement(range(len(populations)), 2))
# Calculations of coalescent rates for the entire time window
try:
print(f"Calculating overall coalescent rates for window: {overall_time_window}")
overall_coal_rates = tree_seq.pair_coalescence_rates(
time_windows=overall_time_window,
sample_sets=populations,
indexes=indexes,
windows=window_breaks
)
print(f"Overall coalescent rates shape: {overall_coal_rates.shape}")
except Exception as e:
print(f"Error calculating overall coalescent rates: {e}")
if __name__ == "__main__":
tree_file_path = "test_tree.trees"
tree_seq = tskit.load(tree_file_path)
# I have 4 populations
populations = [
list(range(0, 30)), # Population 1
list(range(30, 50)), # Population 2
list(range(50, 62)), # Population 3
list(range(62, 74)) # Population 4
]
calculate_overall_coalescent_rates(tree_seq, populations)
And here is the example of code that works when dividing the time windows into smaller intervals [0, 20000, 100000, float('inf')]
:
import tskit
import numpy as np
import itertools
def calculate_pairwise_coalescent_rates(tree_seq, populations):
sequence_length = tree_seq.sequence_length
window_breaks = np.append(np.arange(0, sequence_length, 80000), sequence_length)
time_windows = np.array([0, 20000, 100000, float('inf')])
indexes = list(itertools.combinations_with_replacement(range(len(populations)), 2))
# Calculation of pairwise coalescent rates for the defined time windows
try:
print(f"Calculating pairwise coalescent rates for windows: {window_breaks}")
pairwise_coal_rates = tree_seq.pair_coalescence_rates(
time_windows=time_windows,
sample_sets=populations,
indexes=indexes,
windows=window_breaks
)
print(f"Pairwise coalescent rates shape: {pairwise_coal_rates.shape}")
except Exception as e:
print(f"Error calculating pairwise coalescent rates: {e}")
if __name__ == "__main__":
tree_file_path = "test_tree.trees"
tree_seq = tskit.load(tree_file_path)
# I have 4 populations
populations = [
list(range(0, 30)), # Population 1
list(range(30, 50)), # Population 2
list(range(50, 62)), # Population 3
list(range(62, 74)) # Population 4
]
calculate_pairwise_coalescent_rates(tree_seq, populations)
The error seems to be triggered when using the time window [0, float('inf')]
only. I attached my test_tree_SINGER.trees
file as a zip file. This is actually a .trees
file created by SINGER (ARG-inference by Yun Deng: https://github.com/popgenmethods/SINGER). But when using both scripts on a simulated .trees
file created by SLiM, both scripts work perfectly fine. So there must be something in the .trees
file created by SINGER triggering this bug.
I attached both files (test_tree_SINGER.trees
and test_tree_slim.trees
).
test_tree_files.zip