scc algorithm #235

jitwit · 2019-09-18T13:18:01Z

Implementation of scc algorithm for adjacency maps and adjacency int maps based on https://en.wikipedia.org/wiki/Path-based_strong_component_algorithm.

Avoids using Data.Graph, and benchmarks indicate that this pays off https://jitwit.github.io/criterion/scc-bench.html. In report, old-alga corresponds to alga's KL implementation, new-alga the intmap implementation, ord-alga the new implementation for AMs. Graphs used are the real-world ones from haskell-perf/graphs. AM implementation seems to be around 10% faster and AIM version around 65% faster.

There is no NonEmpty.AdjacencyIntMap module, so for now AdjacencyIntMap.Algorithm returns type AM.AdjacencyMap AdjacencyIntMap.

snowleopard · 2019-09-18T20:02:35Z

@jitwit The results are great! You are unstoppable :-)

scc is quite an important algorithm with many users, so perhaps we should have it in our regression suite to keep an eye on its performance. @nobrakal Do you think you could add it?

snowleopard · 2019-09-18T20:03:25Z

P.S.: I'll need some time to review the implementation due to upcoming travel.

jitwit · 2019-09-18T20:19:35Z

I was actually going to ask about how to add things to regression suite, since it would be useful for bfs too (using a different queue structure might be an opportunity for slightly better performance).

snowleopard · 2019-09-18T20:41:11Z

Here is where the performance testing script comes from:

alga/.travis.yml

Line 105 in 35465fa

    
           curl https://raw.githubusercontent.com/nobrakal/benchAlgaPr/master/compare.sh -o compare.sh;

I guess we could move it to this repository to make changes more convenient.

@nobrakal What do you think?

nobrakal · 2019-09-19T20:15:18Z

@snowleopard

I am actually ashamed of this script, and currently, I think rewriting a very simple benchmarking suite and getting rid of https://github.com/haskell-perf/graphs would be the best option.

I would love to gave haskell-perf/graphs the rewriting it needs, but I don't have any time to do it now.

Anyway, adding scc seems very feasible and I can give it a try if you think it is the best idea :)

snowleopard · 2019-09-19T21:05:47Z

@nobrakal It's always possible to improve things but it's not always necessary :-) The current implementation may be not pretty but the script does work and is very useful!

If you could add scc (and perhaps bfs too) that would be great. And, of course, I'd be happy to have a prettier script when you find time for this :)

snowleopard · 2019-10-06T23:29:27Z

@jitwit There are some merge conflicts, please resolve.

jitwit · 2019-10-07T04:32:44Z

I re-ran the benchmarks and the results were less favorable in some cases for AdjacencyMaps. The graphs where the current version of the new implementation does poorly are those with a high number of small SCCs, the graphs were it does well have a small number of large SCCs. The AdjacencyIntMap version is much faster regardless.

The link to the criterion benchmarks from before has the current results.

snowleopard · 2019-10-08T07:50:24Z

@jitwit I'm looking at this link which corresponds to AdjacencyIntMap I think:

https://jitwit.github.io/criterion/scc-bench.html

This is better across all benchmarks, but can you show the results for AdjacencyMap? I'd like to have more performance info about the problematic cases you mention.

snowleopard · 2019-10-08T07:51:44Z

By the way, the performance regression suite reports scc: 1.10 (OK).

jitwit · 2019-10-08T18:17:32Z

Yeah those names are unclear, I changed them to KL-alga, AM-alga, AIM-alga.

I've looked closer, and the scc algorithm does perform well, the issue seems to be converting to the AM (NonEmptya a) representation afterwards. When there are many sccs, using induce is slow. When there are few sccs classifying edges (another approach I'm experimenting with) is slow.

The result of the scc search contains some additional information, so I was wondering if a hybrid approach might be feasible? For example, depending on the ratio of the number of sccs to the number of vertices, different functions could be used to convert to the final representation

jitwit · 2019-10-08T19:25:18Z

The benches are updated with the "hybrid" approach.

PS. apologies for the messy code, I was aiming to see if the approach would even pay off!
PSS. According to the regression suite apparently not, but the benchmarks I ran on larger graphs look promising

jitwit · 2019-10-08T20:01:52Z

I tried the searches on a twitter graph from SNAP: https://snap.stanford.edu/data/ego-Twitter.html (~81k vertices ~1.7m edges) and the results are good:

https://jitwit.github.io/criterion/twitter-scc.html

snowleopard · 2019-10-12T20:59:37Z

@jitwit Very impressive, especially the Twitter benchmark! I think it's worth adding it to the benchmark suite at https://github.com/haskell-perf/graphs.

I'm concerned with the complexity of the resulting code though :( How much improvement is your hybrid approach bringing percentage-wise? If it's not too much then I think I'd prefer to remove the special tricks and instead focus on making a faster induce, if that's possible.

jitwit · 2019-10-13T22:38:18Z

The number of passes over the graph with induce increases with the number of sccs. I tried the algorithm with induce on the twitter graph and it was something like 5x slower. I think the right solution may be to ditch the hybrid approach as well as induce. Something like

partition :: Eq k => (a -> k) -> Graph a -> [Graph a]

is sort of what is needed.

There are still a few cases where the current approach is slower than the old implementation, but for the most part it's faster.

src/Algebra/Graph/AdjacencyMap/Algorithm.hs

jitwit · 2020-01-14T07:30:39Z

I also added a CPP statement to get rid of the redundant Data.Monoid import warning for post 8.2 ghc

snowleopard · 2020-01-14T10:46:33Z

I also added a CPP statement to get rid of the redundant Data.Monoid import warning for post 8.2 ghc

I think you can just import Data.Semigroup ((<>)) instead, getting rid of CPP. I dropped support for GHC < 8 recently hoping to eliminate most occurrences of CPP.

As long as there are no warnings in GHC 8.6.5, I'm happy!

src/Algebra/Graph/AdjacencyMap/Algorithm.hs

snowleopard · 2020-01-18T20:17:54Z

@jitwit I think I've noticed a few more places where we can save some time. Could you please have a look?

jitwit · 2020-01-19T02:40:23Z

Much better results!

https://jitwit.github.io/benchmarks/criterion/scc-bench.html

Just building the inner graphs (no difference lists) makes a big difference.

snowleopard · 2020-01-19T18:16:50Z

@jitwit Excellent! And the code got simpler too. Looks like we're mostly beating KL now, with only one scenario where we're slightly behind. I think we can tolerate this.

I think we can merge the PR now unless you'd like to do any further improvements.

jitwit · 2020-01-19T20:57:51Z

On larger graphs the performance gain is even bigger. An acyclic facebook network graph gets 200ms to 600ms (4000 vertices 90000 edges) and the twitter one gets 6.4s to 60s (80000 vertices 1,700,000 edges)

The part I'd maybe want to improve is the incomplete pattern on (curr,_:pth') = span (==v) pathStack. Any preference for how, extra helper, or using tail or something else?

snowleopard · 2020-01-19T21:42:23Z

200ms to 600ms [...] and 6.4s to 60s

@jitwit This is pretty cool! I guess when you're benchmarking KL you are including the overhead of converting an adjacency map to an array representation. What happens if you start with a graph stored in KL-style array? Is your implementation still significantly faster?

I don't really mind (curr,_:pth') = span (/=v) pth. I think going via an intermediate function would just obscure the invariant. One slight improvement is perhaps (curr,_v:pth') = span (/=v) pth, i.e. giving the unused binding a name.

jitwit · 2020-01-20T21:11:45Z

I haven't done a proper benchmark yet, but from a quick profile quite a lot of the time is spent in induce in the old implementation on the twitter graph.

Since the condensation construction is interleaved with the pre-order traversal, I'm not totally sure how the implementations should be compared. There's definitely an algorithmic win from avoiding the quadratic induce, besides the benefits of not converting representation to arrays.

snowleopard · 2020-01-20T21:58:41Z

@jitwit Note that your implementation uses gmap which might also be quadratic when mapping into vertices with expensive Ord instance (as in our case). The complexity of gmap given in the docs does not take into account the cost of Ord (it probably should!). So, I'm not entirely sure that the worst-case complexity really became subquadratic. But even if it is quadratic in the worst case, it looks like the typical case got much better!

jitwit · 2020-01-21T18:46:05Z

It looks like the Ord instance for Maps uses compare ``on`` toAscList. Because the vertices of the components are disjoint, this will always classify from just the head of the lists, right? The docs say toAscList is subject to list fusion. Is that enough to get the time complexity for Ord comparisons (in this case) as the height of the left spine of the maps?

There is also an inverse relation between the size of the components and how many components there are, which also helps limit the cost of Ord comparisons considering size and amount.

jitwit · 2020-01-21T21:25:58Z

Given n,m and n_o,m_o the number of components and edges between them, I think the complexity is something like:

O((n+m)*log n) for the traversal (constructing the inner graphs takes SUM (n_c+m_c)*log(n_c) over components, which is subsumed by (n+m)*log n.
O((n_o+m_o)*log n_o*log (n-n_o)) for building the condensation. With the new change to arrays lookup is O(1). Largest possible component size is (n-n_o), so multiplying by log(n-n_o) accounts for the comparisons (based on my previous comment)?

snowleopard · 2020-01-21T21:57:42Z

It looks like the Ord instance for Maps uses compare ``on`` toAscList. Because the vertices of the components are disjoint, this will always classify from just the head of the lists, right?

@jitwit Adjacency maps do not use Map's Ord instance. Instead, their Ord is crafted so that <= refines the subgraph relation isSubgraphOf:

alga/src/Algebra/Graph/AdjacencyMap.hs

Lines 167 to 172 in 43ea348

    
           instance Ord a => Ord (AdjacencyMap a) where 
        
               compare x y = mconcat 
        
                   [ compare (vertexCount x) (vertexCount  y) 
        
                   , compare (vertexSet   x) (vertexSet    y) 
        
                   , compare (edgeCount   x) (edgeCount    y) 
        
                   , compare (edgeSet     x) (edgeSet      y) ]

The first check compares vertexCount, which is O(1). Then we use vertexSet, which as you say is likely to benefit from the fact that the graphs are disjoint. If we optimistically assume that it takes constant time too, then the overall time complexity seems to be O((n + m) log n). I suggest we use it. Even if we are off by a O(log n), it's probably dominated by various other noise.

Pretty cool!

jitwit · 2020-01-21T22:03:00Z

Ah! Sorry, should have checked the Ord instance. Neat! It seems like this is a case where it pays to use Data.Map over Data.IntMap since vertexCount would be O(n) otherwise. I'll just go ahead and put O((n+m) * log n) for the time complexity in this case, as you suggest

jitwit · 2020-01-21T22:04:29Z

Should I delete the oldscc implementation in Data.Graph.Typed as well as the test which compares them?

snowleopard · 2020-01-21T22:08:52Z

Neat! It seems like this is a case where it pays to use Data.Map over Data.IntMap since vertexCount would be O(n) otherwise.

Indeed, I forgot that IntMap doesn't store the size! I guess we could cache it in AdjacencyIntMap ourselves. We could also cache a few other fields like vertexSet, vertexList, etc. Thanks to laziness the overhead would probably be acceptable. (Of course, this should better be done in a separate PR.)

Should I delete the old scc implementation in Data.Graph.Typed as well as the test which compares them?

No, let's keep it. It's always good to have a good testsuite. I'm pretty sure we'll be doing further tweaks to the implementation and it's good to know that (at least some) bugs will be caught.

jitwit · 2020-01-21T22:14:39Z

Makes sense! And yeah, would be interesting to see if caching for AIM's could be beneficial

snowleopard · 2020-01-21T23:11:38Z

@jitwit OK, merged! Many thanks for your hard work and for tolerating such long reviews :-)

jitwit added 3 commits September 17, 2019 20:06

save gabow path based scc draft

2415a68

mirror return type of scc from AM module in AIM module

306392f

port gabow scc to AM as well

3ec3814

build vertex sets while searching and fix <> compat pre ghc 8.4

a2d4982

jitwit added 2 commits September 21, 2019 12:41

inline graph construction

fa97f6b

add comments and foldl' instead of foldr

389b7c5

jitwit added 2 commits October 6, 2019 21:22

Merge remote-tracking branch 'upstream/master' into scc

2229ae6

previous version was faster without inline inducing sccs

7fa20e5

try hybrid approach

b166f77

jitwit added 6 commits October 13, 2019 14:21

starting to clean presentation

5e0ffea

cleaning presentation

9d4472e

export constructors for use of coerce and clean presentation

7a7fee2

save "progress"

f4c24fb

much ado

ff5982a

fix erroneous remove loops

bf816de

prevent Data.Monoid import warning with CPP

7ca2bbd

snowleopard reviewed Jan 13, 2020

View reviewed changes