Garbage collection in ibd_segments, link_ancestors and simplify #2461
Labels
enhancement
New feature or request
Performance
This issue addresses performance, either runtime or memory
All three of these methods rely on an object called
ancestor_map
(C codebase) orA
(Python mockups) to store segments that descend from each ancestral node in the tree sequence. In the case ofibd_segments
andlink_ancestors
, these can be a big memory hog (much bigger than the size of the tree sequences themselves). However, the segments corresponding to a given ancestral node are no longer needed once we've processed all of the edges with that node in the 'child' position of the edge table. By pruning away this garbage periodically, we could improve the memory usage of these methods (at the expense of some runtime).The difficulty is in the freeing of memory -- since the
ancestor_map
objects have their memory allocated all at once, they also have to be freed all at once. It's difficult to imagine changing this without substantially complicating the existing code.one alternative is to periodically create a pruned copy of the
ancestor_map
, and to then replace the original version with the pruned copy. I'm about to push up a PR that demonstrates this.The text was updated successfully, but these errors were encountered: