Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making find_IBD more efficient by pruning ancestral segments #1636

Open
gtsambos opened this issue Aug 18, 2021 · 1 comment
Open

Making find_IBD more efficient by pruning ancestral segments #1636

gtsambos opened this issue Aug 18, 2021 · 1 comment
Assignees
Labels
Performance This issue addresses performance, either runtime or memory

Comments

@gtsambos
Copy link
Member

find_ibd relies on creating 'master' lists of segments underneath ancestral node. However, these lists can take up a lot of memory, especially if they are high up in the tree.

However, it isn't necessary to keep all of these lists. Once we have processed all of the edges where some node u is a child, we will not need to refer to the list of segments underneath u any more, and can prune these segments from memory.

This could be done by keeping a list that counts the number of relevant edges for each ancestral node, and calls a 'prune' function when that number is decremented to 0 for any node.

@gtsambos gtsambos self-assigned this Aug 18, 2021
@gtsambos gtsambos added the Performance This issue addresses performance, either runtime or memory label Aug 18, 2021
@jeromekelleher
Copy link
Member

This isn't as straightforward as it might seem because we're using the tsk_blkalloc allocator to allocate segments, and it doesn't support freeing. We would either have to adapt the "heap" allocator we use in msprime (which supports freeing) or do the malloc/freeing manually segment-by-segment. Both options have downsides, so we should probably get some idea of how much memory this would save in typical cases first before embarking on this. Probably best to get the IbdResult class fully nailed down before doing this (#1639)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance This issue addresses performance, either runtime or memory
Projects
None yet
Development

No branches or pull requests

2 participants