Skip to content

pair_coalescence_rates fails with overall time window [0, float('inf')] and sometimes also with [0, 20000, 100000, float('inf')] #3035

@Luker121

Description

@Luker121

Hi,
This bug is similar to what was reported in #2986, where we discussed a similar issue that occurs with pairwise coalescent rate calculations. I have to say, that I am using the 0.6.0-dev version of tskit.

Here is my problem:

When trying to calculate the overall pairwise coalescent rates using the time window [0, float('inf')] and genomic windows, I get a similar bug, which we had before already in #2986:

Bug detected in lib/tskit/trees.c at line 9801. If you are using tskit directly please open an issue on GitHub, ideally with a reproducible example.

Here is the non-working code that produces the bug using this time window [0, float('inf')]:

import tskit
import numpy as np
import itertools

def calculate_overall_coalescent_rates(tree_seq, populations):
    sequence_length = tree_seq.sequence_length
    window_breaks = np.append(np.arange(0, sequence_length, 80000), sequence_length)
    
    
    overall_time_window = np.array([0, float('inf')])

    indexes = list(itertools.combinations_with_replacement(range(len(populations)), 2))
    
    # Calculations of coalescent rates for the entire time window
    try:
        print(f"Calculating overall coalescent rates for window: {overall_time_window}")
        overall_coal_rates = tree_seq.pair_coalescence_rates(
            time_windows=overall_time_window,
            sample_sets=populations,
            indexes=indexes,
            windows=window_breaks 
        )
        print(f"Overall coalescent rates shape: {overall_coal_rates.shape}")
    except Exception as e:
        print(f"Error calculating overall coalescent rates: {e}")


if __name__ == "__main__":
    tree_file_path = "test_tree.trees"
    tree_seq = tskit.load(tree_file_path)

    # I have 4 populations
    populations = [
        list(range(0, 30)),  # Population 1
        list(range(30, 50)),  # Population 2
        list(range(50, 62)),  # Population 3
        list(range(62, 74))   # Population 4
    ]

    calculate_overall_coalescent_rates(tree_seq, populations)

And here is the example of code that works when dividing the time windows into smaller intervals [0, 20000, 100000, float('inf')]:

import tskit
import numpy as np
import itertools

def calculate_pairwise_coalescent_rates(tree_seq, populations):
    sequence_length = tree_seq.sequence_length
    window_breaks = np.append(np.arange(0, sequence_length, 80000), sequence_length)
    
    time_windows = np.array([0, 20000, 100000, float('inf')])

    indexes = list(itertools.combinations_with_replacement(range(len(populations)), 2))
    
    # Calculation of pairwise coalescent rates for the defined time windows
    try:
        print(f"Calculating pairwise coalescent rates for windows: {window_breaks}")
        pairwise_coal_rates = tree_seq.pair_coalescence_rates(
            time_windows=time_windows,
            sample_sets=populations,
            indexes=indexes,
            windows=window_breaks 
        )
        print(f"Pairwise coalescent rates shape: {pairwise_coal_rates.shape}")
    except Exception as e:
        print(f"Error calculating pairwise coalescent rates: {e}")

if __name__ == "__main__":
    tree_file_path = "test_tree.trees"
    tree_seq = tskit.load(tree_file_path)

    # I have 4 populations
    populations = [
        list(range(0, 30)),  # Population 1
        list(range(30, 50)),  # Population 2
        list(range(50, 62)),  # Population 3
        list(range(62, 74))   # Population 4
    ]

    calculate_pairwise_coalescent_rates(tree_seq, populations)

The error seems to be triggered when using the time window [0, float('inf')] only. I attached my test_tree_SINGER.trees file as a zip file. This is actually a .trees file created by SINGER (ARG-inference by Yun Deng: https://github.com/popgenmethods/SINGER). But when using both scripts on a simulated .trees file created by SLiM, both scripts work perfectly fine. So there must be something in the .trees file created by SINGER triggering this bug.
I attached both files (test_tree_SINGER.trees and test_tree_slim.trees).
test_tree_files.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions