Description
At the moment we are finding that the slow stage of inference is match_ancestors
, since it is only parallelized within an epoch (for ancestors at exactly the same age). This is because we need to make sure that ancestors can copy off all older ancestors if necessary.
As mentioned to @awohns, I think we could, however, parallelize ancestor matching as long as the ancestors we are processing in parallel do not cover the same region of genome. In other words, if we keep a list, L
, of the start
and end
values of the ancestors we are currently matching, I think we can process the next-ancestor-in-time as long as its start and end values don't intersect with the segments in L
.
Is my thinking correct? If so, I don't know how much speedup we would get, but I suspect that a long chromosome (when ancestors are short) might be substantially parallelizable,