List intersection in Simplex_tree expansion #1077

mglisse · 2024-06-19T22:28:07Z

gudhi-devel/src/Simplex_tree/include/gudhi/Simplex_tree.h

Lines 1686 to 1688 in f6d9d3d

    
           static void intersection(std::vector<std::pair<Vertex_handle, Node> >& intersection, 
        
                                    Dictionary_it begin1, Dictionary_it end1, 
        
                                    Dictionary_it begin2, Dictionary_it end2,

In Simplex_tree::expansion, the main operation is (roughly) computing the intersection of 2 sorted lists, and we currently do that using the simplest algorithm, with complexity len(L1)+len(L2). However, as https://arxiv.org/abs/2301.07191 notices (in particular for Erdős–Rényi graphs), these 2 lists are typically quite unbalanced: one is at depth k, while the other one is always at depth 1. Assuming this is indeed what takes most of the time in expansion (and not allocations for instance), some options could be

replace the second list with a deeper (smaller) one. Currently, looking at the children of 1234 (say 5, 6 and 7), we first pick 5, L1={6, 7}, and L2 is the list of children of the vertex 5. This works because if 12345, 12346 and 56 are cliques, then 123456 is as well. However, we could also pick for L2 the children of the simplex 125 (this is just an example, there are other possible choices), the intersection remains the same and the list should be smaller. This requires ensuring that the children of 125 are computed before those of 12345, which might require reversing the iteration order.
use a different intersection algorithm. A fancy one could use a galloping (exponential search) strategy. Simpler, and well suited to the unbalanced case, is, for every element of L1, checking if it belongs to L2 (this assumes that L1 is shorter, otherwise it should be more efficient in the reverse direction). Using standard binary search, this gives a complexity n₁*log(n₂). To speed this up even further, we could in a preprocessing step store the (oriented?) graph in a datastructure with fast lookup, either a dense array (that would waste a huge amount of memory for very sparse graphs), or a hash table, and get the complexity down to n₁. We can also use the minmax values of L2 to limit the part of L1 on which we need to iterate.

With a bit of profiling, on some examples, I notice that intersection accounts for less than half of the time of expansion. Some other costly operations:

allocating Siblings with new/delete → we could use an allocator (a pool?)
copying the vector produced by intersection into a flat_map → not much we can do about that, it has to be allocated at some point

The text was updated successfully, but these errors were encountered:

mglisse added the optimization It works, now make it faster label Jun 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List intersection in Simplex_tree expansion #1077

List intersection in Simplex_tree expansion #1077

mglisse commented Jun 19, 2024 •

edited

Loading

List intersection in Simplex_tree expansion #1077

List intersection in Simplex_tree expansion #1077

Comments

mglisse commented Jun 19, 2024 • edited Loading

mglisse commented Jun 19, 2024 •

edited

Loading