Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scc algorithm #235

Merged
merged 47 commits into from
Jan 21, 2020
Merged

scc algorithm #235

merged 47 commits into from
Jan 21, 2020

Conversation

jitwit
Copy link
Contributor

@jitwit jitwit commented Sep 18, 2019

Implementation of scc algorithm for adjacency maps and adjacency int maps based on https://en.wikipedia.org/wiki/Path-based_strong_component_algorithm.

Avoids using Data.Graph, and benchmarks indicate that this pays off https://jitwit.github.io/criterion/scc-bench.html. In report, old-alga corresponds to alga's KL implementation, new-alga the intmap implementation, ord-alga the new implementation for AMs. Graphs used are the real-world ones from haskell-perf/graphs. AM implementation seems to be around 10% faster and AIM version around 65% faster.

There is no NonEmpty.AdjacencyIntMap module, so for now AdjacencyIntMap.Algorithm returns type AM.AdjacencyMap AdjacencyIntMap.

@snowleopard
Copy link
Owner

@jitwit The results are great! You are unstoppable :-)

scc is quite an important algorithm with many users, so perhaps we should have it in our regression suite to keep an eye on its performance. @nobrakal Do you think you could add it?

@snowleopard
Copy link
Owner

P.S.: I'll need some time to review the implementation due to upcoming travel.

@jitwit
Copy link
Contributor Author

jitwit commented Sep 18, 2019

I was actually going to ask about how to add things to regression suite, since it would be useful for bfs too (using a different queue structure might be an opportunity for slightly better performance).

@snowleopard
Copy link
Owner

snowleopard commented Sep 18, 2019

Here is where the performance testing script comes from:

curl https://raw.githubusercontent.com/nobrakal/benchAlgaPr/master/compare.sh -o compare.sh;

I guess we could move it to this repository to make changes more convenient.

@nobrakal What do you think?

@nobrakal
Copy link
Contributor

@snowleopard

I am actually ashamed of this script, and currently, I think rewriting a very simple benchmarking suite and getting rid of https://github.com/haskell-perf/graphs would be the best option.

I would love to gave haskell-perf/graphs the rewriting it needs, but I don't have any time to do it now.

Anyway, adding scc seems very feasible and I can give it a try if you think it is the best idea :)

@snowleopard
Copy link
Owner

snowleopard commented Sep 19, 2019

@nobrakal It's always possible to improve things but it's not always necessary :-) The current implementation may be not pretty but the script does work and is very useful!

If you could add scc (and perhaps bfs too) that would be great. And, of course, I'd be happy to have a prettier script when you find time for this :)

@snowleopard
Copy link
Owner

@jitwit There are some merge conflicts, please resolve.

@jitwit
Copy link
Contributor Author

jitwit commented Oct 7, 2019

I re-ran the benchmarks and the results were less favorable in some cases for AdjacencyMaps. The graphs where the current version of the new implementation does poorly are those with a high number of small SCCs, the graphs were it does well have a small number of large SCCs. The AdjacencyIntMap version is much faster regardless.

The link to the criterion benchmarks from before has the current results.

@snowleopard
Copy link
Owner

@jitwit I'm looking at this link which corresponds to AdjacencyIntMap I think:

https://jitwit.github.io/criterion/scc-bench.html

This is better across all benchmarks, but can you show the results for AdjacencyMap? I'd like to have more performance info about the problematic cases you mention.

@snowleopard
Copy link
Owner

By the way, the performance regression suite reports scc: 1.10 (OK).

@jitwit
Copy link
Contributor Author

jitwit commented Oct 8, 2019

Yeah those names are unclear, I changed them to KL-alga, AM-alga, AIM-alga.

I've looked closer, and the scc algorithm does perform well, the issue seems to be converting to the AM (NonEmptya a) representation afterwards. When there are many sccs, using induce is slow. When there are few sccs classifying edges (another approach I'm experimenting with) is slow.

The result of the scc search contains some additional information, so I was wondering if a hybrid approach might be feasible? For example, depending on the ratio of the number of sccs to the number of vertices, different functions could be used to convert to the final representation

@jitwit
Copy link
Contributor Author

jitwit commented Oct 8, 2019

The benches are updated with the "hybrid" approach.

PS. apologies for the messy code, I was aiming to see if the approach would even pay off!
PSS. According to the regression suite apparently not, but the benchmarks I ran on larger graphs look promising

@jitwit
Copy link
Contributor Author

jitwit commented Oct 8, 2019

I tried the searches on a twitter graph from SNAP: https://snap.stanford.edu/data/ego-Twitter.html (~81k vertices ~1.7m edges) and the results are good:

https://jitwit.github.io/criterion/twitter-scc.html

@snowleopard
Copy link
Owner

@jitwit Very impressive, especially the Twitter benchmark! I think it's worth adding it to the benchmark suite at https://github.com/haskell-perf/graphs.

I'm concerned with the complexity of the resulting code though :( How much improvement is your hybrid approach bringing percentage-wise? If it's not too much then I think I'd prefer to remove the special tricks and instead focus on making a faster induce, if that's possible.

@jitwit
Copy link
Contributor Author

jitwit commented Oct 13, 2019

The number of passes over the graph with induce increases with the number of sccs. I tried the algorithm with induce on the twitter graph and it was something like 5x slower. I think the right solution may be to ditch the hybrid approach as well as induce. Something like

partition :: Eq k => (a -> k) -> Graph a -> [Graph a]

is sort of what is needed.

There are still a few cases where the current approach is slower than the old implementation, but for the most part it's faster.

@jitwit
Copy link
Contributor Author

jitwit commented Jan 14, 2020

I also added a CPP statement to get rid of the redundant Data.Monoid import warning for post 8.2 ghc

@snowleopard
Copy link
Owner

I also added a CPP statement to get rid of the redundant Data.Monoid import warning for post 8.2 ghc

I think you can just import Data.Semigroup ((<>)) instead, getting rid of CPP. I dropped support for GHC < 8 recently hoping to eliminate most occurrences of CPP.

As long as there are no warnings in GHC 8.6.5, I'm happy!

@snowleopard
Copy link
Owner

@jitwit I think I've noticed a few more places where we can save some time. Could you please have a look?

@jitwit
Copy link
Contributor Author

jitwit commented Jan 19, 2020

Much better results!

https://jitwit.github.io/benchmarks/criterion/scc-bench.html

Just building the inner graphs (no difference lists) makes a big difference.

@snowleopard
Copy link
Owner

@jitwit Excellent! And the code got simpler too. Looks like we're mostly beating KL now, with only one scenario where we're slightly behind. I think we can tolerate this.

I think we can merge the PR now unless you'd like to do any further improvements.

@jitwit
Copy link
Contributor Author

jitwit commented Jan 19, 2020

On larger graphs the performance gain is even bigger. An acyclic facebook network graph gets 200ms to 600ms (4000 vertices 90000 edges) and the twitter one gets 6.4s to 60s (80000 vertices 1,700,000 edges)

The part I'd maybe want to improve is the incomplete pattern on (curr,_:pth') = span (==v) pathStack. Any preference for how, extra helper, or using tail or something else?

@snowleopard
Copy link
Owner

200ms to 600ms [...] and 6.4s to 60s

@jitwit This is pretty cool! I guess when you're benchmarking KL you are including the overhead of converting an adjacency map to an array representation. What happens if you start with a graph stored in KL-style array? Is your implementation still significantly faster?

I don't really mind (curr,_:pth') = span (/=v) pth. I think going via an intermediate function would just obscure the invariant. One slight improvement is perhaps (curr,_v:pth') = span (/=v) pth, i.e. giving the unused binding a name.

@jitwit
Copy link
Contributor Author

jitwit commented Jan 20, 2020

I haven't done a proper benchmark yet, but from a quick profile quite a lot of the time is spent in induce in the old implementation on the twitter graph.

Since the condensation construction is interleaved with the pre-order traversal, I'm not totally sure how the implementations should be compared. There's definitely an algorithmic win from avoiding the quadratic induce, besides the benefits of not converting representation to arrays.

@snowleopard
Copy link
Owner

snowleopard commented Jan 20, 2020

@jitwit Note that your implementation uses gmap which might also be quadratic when mapping into vertices with expensive Ord instance (as in our case). The complexity of gmap given in the docs does not take into account the cost of Ord (it probably should!). So, I'm not entirely sure that the worst-case complexity really became subquadratic. But even if it is quadratic in the worst case, it looks like the typical case got much better!

@jitwit
Copy link
Contributor Author

jitwit commented Jan 21, 2020

It looks like the Ord instance for Maps uses compare ``on`` toAscList. Because the vertices of the components are disjoint, this will always classify from just the head of the lists, right? The docs say toAscList is subject to list fusion. Is that enough to get the time complexity for Ord comparisons (in this case) as the height of the left spine of the maps?

There is also an inverse relation between the size of the components and how many components there are, which also helps limit the cost of Ord comparisons considering size and amount.

@jitwit
Copy link
Contributor Author

jitwit commented Jan 21, 2020

Given n,m and n_o,m_o the number of components and edges between them, I think the complexity is something like:

  • O((n+m)*log n) for the traversal (constructing the inner graphs takes SUM (n_c+m_c)*log(n_c) over components, which is subsumed by (n+m)*log n.
  • O((n_o+m_o)*log n_o*log (n-n_o)) for building the condensation. With the new change to arrays lookup is O(1). Largest possible component size is (n-n_o), so multiplying by log(n-n_o) accounts for the comparisons (based on my previous comment)?

@snowleopard
Copy link
Owner

It looks like the Ord instance for Maps uses compare ``on`` toAscList. Because the vertices of the components are disjoint, this will always classify from just the head of the lists, right?

@jitwit Adjacency maps do not use Map's Ord instance. Instead, their Ord is crafted so that <= refines the subgraph relation isSubgraphOf:

instance Ord a => Ord (AdjacencyMap a) where
compare x y = mconcat
[ compare (vertexCount x) (vertexCount y)
, compare (vertexSet x) (vertexSet y)
, compare (edgeCount x) (edgeCount y)
, compare (edgeSet x) (edgeSet y) ]

The first check compares vertexCount, which is O(1). Then we use vertexSet, which as you say is likely to benefit from the fact that the graphs are disjoint. If we optimistically assume that it takes constant time too, then the overall time complexity seems to be O((n + m) log n). I suggest we use it. Even if we are off by a O(log n), it's probably dominated by various other noise.

Pretty cool!

@jitwit
Copy link
Contributor Author

jitwit commented Jan 21, 2020

Ah! Sorry, should have checked the Ord instance. Neat! It seems like this is a case where it pays to use Data.Map over Data.IntMap since vertexCount would be O(n) otherwise. I'll just go ahead and put O((n+m) * log n) for the time complexity in this case, as you suggest

@jitwit
Copy link
Contributor Author

jitwit commented Jan 21, 2020

Should I delete the oldscc implementation in Data.Graph.Typed as well as the test which compares them?

@snowleopard
Copy link
Owner

Neat! It seems like this is a case where it pays to use Data.Map over Data.IntMap since vertexCount would be O(n) otherwise.

Indeed, I forgot that IntMap doesn't store the size! I guess we could cache it in AdjacencyIntMap ourselves. We could also cache a few other fields like vertexSet, vertexList, etc. Thanks to laziness the overhead would probably be acceptable. (Of course, this should better be done in a separate PR.)

Should I delete the old scc implementation in Data.Graph.Typed as well as the test which compares them?

No, let's keep it. It's always good to have a good testsuite. I'm pretty sure we'll be doing further tweaks to the implementation and it's good to know that (at least some) bugs will be caught.

@jitwit
Copy link
Contributor Author

jitwit commented Jan 21, 2020

Makes sense! And yeah, would be interesting to see if caching for AIM's could be beneficial

@snowleopard snowleopard merged commit 026ba0e into snowleopard:master Jan 21, 2020
@snowleopard
Copy link
Owner

@jitwit OK, merged! Many thanks for your hard work and for tolerating such long reviews :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants