Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph #1182

sinhatushar · 2019-03-24T11:53:14Z

Making some small changes makes the current implementation of strongly_connected_components() 4-5 times faster and requires 500 times lesser memory.

Acknowledgement : @simonschoelly helped me in figuring out what is wrong with the current implementation.

Some benchmarks :

julia> eg = SimpleDiGraph{Int32}(loadsnap(:ego_twitter_d));

##Current implementation
julia> @benchmark strongly_connected_components(eg)
BenchmarkTools.Trial: 
  memory estimate:  2.30 GiB
  allocs estimate:  36776
  --------------
  minimum time:     102.059 ms (59.44% GC)
  median time:      115.630 ms (60.22% GC)
  mean time:        123.090 ms (62.91% GC)
  maximum time:     214.746 ms (78.27% GC)
  --------------
  samples:          41
  evals/sample:     1

##Proposed implementation
julia> @benchmark strongly_connected_components(eg)
BenchmarkTools.Trial: 
  memory estimate:  4.82 MiB
  allocs estimate:  24598
  --------------
  minimum time:     26.544 ms (0.00% GC)
  median time:      27.205 ms (0.00% GC)
  mean time:        27.964 ms (1.41% GC)
  maximum time:     36.095 ms (6.22% GC)
  --------------
  samples:          179
  evals/sample:     1

julia> ss = SimpleDiGraph{Int32}(loadsnap(:soc_slashdot0902_d));

##Current implementation
julia> @benchmark strongly_connected_components(ss)
BenchmarkTools.Trial: 
  memory estimate:  1.70 GiB
  allocs estimate:  31705
  --------------
  minimum time:     116.306 ms (54.33% GC)
  median time:      122.452 ms (54.30% GC)
  mean time:        130.374 ms (56.26% GC)
  maximum time:     214.087 ms (72.84% GC)
  --------------
  samples:          39
  evals/sample:     1


##Proposed implementation
julia> @benchmark strongly_connected_components(ss)
BenchmarkTools.Trial: 
  memory estimate:  4.52 MiB
  allocs estimate:  21190
  --------------
  minimum time:     34.608 ms (0.00% GC)
  median time:      35.258 ms (0.00% GC)
  mean time:        36.028 ms (0.86% GC)
  maximum time:     40.149 ms (4.55% GC)
  --------------
  samples:          139
  evals/sample:     1

codecov · 2019-03-24T12:01:24Z

Codecov Report

Merging #1182 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1182      +/-   ##
==========================================
- Coverage   99.62%   99.62%   -0.01%     
==========================================
  Files          93       93              
  Lines        4218     4217       -1     
==========================================
- Hits         4202     4201       -1     
  Misses         16       16

abhinavmehndiratta · 2019-03-24T13:18:30Z

@sinhatushar is this implementation now faster than #1173 ?

sinhatushar · 2019-03-24T13:36:44Z

@sinhatushar is this implementation now faster than #1173 ?

Yes, this implementation is slightly faster than #1173. Tarjan's algorithm in theory is faster than Kosaraju's algorithm. However the current implementation(Tarjan's algorithm) wasn't working as fast as expected which has been fixed in this PR.
But Kosaraju algorithm (#1173) can be an addition to growing pool of algortihms in LightGraphs.jl

simonschoelly · 2019-03-24T13:57:36Z

julia> eg = squash(loadsnap(:ego_twitter_d))
{81306, 1768149} directed simple UInt32 graph

julia> @btime strongly_connected_components_kosaraju($eg);
  27.307 ms (24577 allocations: 3.20 MiB)

julia> @btime strongly_connected_components($eg);
  22.376 ms (36847 allocations: 6.21 MiB)

Slightly faster

simonschoelly · 2019-03-24T13:59:15Z

src/connectivity.jl

    index = zeros(T, nvg)         # first time in which vertex is discovered
-    stack = Vector{T}()           # stores vertices which have been discovered and not yet assigned to any component
+    stack = T[]                   # stores vertices which have been discovered and not yet assigned to any component


We have this convention in LightGraphs, that for creating empty arrays of a certain type we use

Vector{T}()

The outcome is the same.

simonschoelly · 2019-03-24T13:59:29Z

src/connectivity.jl


-    for s in vertices(g)
+
+    dfs_stack = T[]


simonschoelly

We still allocate a vector twice for each component. Once when creating component and then another time, when we use reverse.

sinhatushar · 2019-03-24T15:06:43Z

We still allocate a vector twice for each component. Once when creating component and then another time, when we use reverse.

I was willing to remove reverse but when I ran the tests, I saw that some tests for functions which use
strongly_connected_components() failed .
I am changing reverse() to reverse!() to remove double memory allocation for each component.

src/connectivity.jl

simonschoelly · 2019-03-24T21:51:46Z

src/connectivity.jl

+julia> g=SimpleDiGraph(11)
+{11, 0} directed simple Int64 graph
+
+julia> edge_list=[(1,2),(2,3),(3,4),(4,1),(3,5),(5,6),(6,7),(7,5),(5,8),(8,9),(9,8),(10,11),(11,10)]


If you write julia> edge_list=[(1,2),(2,3),(3,4),(4,1),(3,5),(5,6),(6,7),(7,5),(5,8),(8,9),(9,8),(10,11),(11,10)]; (with a semicolon instead), then you won't get any output, so maybe we can shorten the unnecessary long output in this example.

simonschoelly · 2019-03-24T21:52:52Z

This looks good to me, of course there would still be some ways to get some minor performance improvements, but the mayor bottleneck has been removed.

sinhatushar · 2019-03-25T18:10:42Z

@sbromberger , waiting for you to merge this if you find this PR suitable. @simonschoelly has approved these changes.

sbromberger · 2019-03-26T00:17:55Z

src/connectivity.jl

@@ -178,6 +178,9 @@ true
 """
 is_weakly_connected(g) = is_connected(g)

+
+
+


I don't think we need all this whitespace here.

sbromberger · 2019-03-26T00:18:35Z

src/connectivity.jl

    index = zeros(T, nvg)         # first time in which vertex is discovered
-    stack = Vector{T}()           # stores vertices which have been discovered and not yet assigned to any component
+    stack = Vector{T}()                   # stores vertices which have been discovered and not yet assigned to any component


indent not needed.

sbromberger · 2019-03-26T00:20:17Z

src/connectivity.jl

@@ -263,7 +285,7 @@ function strongly_connected_components end
                                break
                            end
                        end
-                        push!(components, reverse(component))
+                        push!(components, reverse!(component))


This is a bit of a hack. It relies on the fact that reverse! returns the modified vector, but strictly speaking, it doesn't HAVE to since it's a mutating function. Better to call reverse! component on the line before and use component here.

sbromberger

Nice!!

…omponents in a directed graph (#1182) * Make changes to make the algorithm more efficient. * Add one more example. * Remove unnecessary comments. * Use Vector{T}() instead of T[] . * Use reverse!() instead of reverse(). * Fix typo and remove redundant empty line * Shorten the example. * Make minor changes. * Final changes.

* first examples * Improve code coverage (#1183) * ✅ Improve test coverage of euclidean_graph * ✅ Improve test coverage for cycle_basis * ✅ Improve test coverage of graphmatrices.jl * 🐛 Fix some bugs in tests * 📝 Add some examples for random graph generators (#1180) #1142 * Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph (#1182) * Make changes to make the algorithm more efficient. * Add one more example. * Remove unnecessary comments. * Use Vector{T}() instead of T[] . * Use reverse!() instead of reverse(). * Fix typo and remove redundant empty line * Shorten the example. * Make minor changes. * Final changes. * randgraphs * smallgraphs * simpleedge

sinhatushar and others added 3 commits March 24, 2019 17:11

Make changes to make the algorithm more efficient.

3d292a8

Add one more example.

593f3c1

Remove unnecessary comments.

7d9ef47

sinhatushar changed the title ~~Time and memory efficient Tarjan's algorithm for strongly connected components in a connected graph~~ Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph Mar 24, 2019

simonschoelly reviewed Mar 24, 2019

View reviewed changes

sinhatushar added 2 commits March 24, 2019 20:41

Use Vector{T}() instead of T[] .

7f41a15

Use reverse!() instead of reverse().

92382f4

simonschoelly reviewed Mar 24, 2019

View reviewed changes

src/connectivity.jl Outdated Show resolved Hide resolved

simonschoelly reviewed Mar 24, 2019

View reviewed changes

src/connectivity.jl Show resolved Hide resolved

Fix typo and remove redundant empty line

9b533f1

simonschoelly reviewed Mar 24, 2019

View reviewed changes

simonschoelly approved these changes Mar 24, 2019

View reviewed changes

Shorten the example.

97a10c3

Merge branch 'master' into tarjan

0ba54b5

sbromberger reviewed Mar 26, 2019

View reviewed changes

sinhatushar added 2 commits March 26, 2019 11:20

Make minor changes.

b1f5bb4

Final changes.

bb620b6

sbromberger approved these changes Mar 27, 2019

View reviewed changes

sbromberger merged commit 1e47426 into sbromberger:master Mar 27, 2019

sinhatushar deleted the tarjan branch March 27, 2019 19:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph #1182

Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph #1182

sinhatushar commented Mar 24, 2019 •

edited

Loading

codecov bot commented Mar 24, 2019 •

edited

Loading

abhinavmehndiratta commented Mar 24, 2019 •

edited

Loading

sinhatushar commented Mar 24, 2019

simonschoelly commented Mar 24, 2019

simonschoelly Mar 24, 2019

sinhatushar Mar 24, 2019

simonschoelly Mar 24, 2019

sinhatushar Mar 24, 2019

simonschoelly left a comment

sinhatushar commented Mar 24, 2019 •

edited

Loading

simonschoelly Mar 24, 2019

sinhatushar Mar 25, 2019

simonschoelly commented Mar 24, 2019

sinhatushar commented Mar 25, 2019

sbromberger Mar 26, 2019

sinhatushar Mar 26, 2019

sbromberger Mar 26, 2019

sinhatushar Mar 26, 2019

sbromberger Mar 26, 2019

sinhatushar Mar 26, 2019

sbromberger left a comment

		@@ -178,6 +178,9 @@ true
		"""
		is_weakly_connected(g) = is_connected(g)

Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph #1182

Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph #1182

Conversation

sinhatushar commented Mar 24, 2019 • edited Loading

codecov bot commented Mar 24, 2019 • edited Loading

Codecov Report

abhinavmehndiratta commented Mar 24, 2019 • edited Loading

sinhatushar commented Mar 24, 2019

simonschoelly commented Mar 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonschoelly left a comment

Choose a reason for hiding this comment

sinhatushar commented Mar 24, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonschoelly commented Mar 24, 2019

sinhatushar commented Mar 25, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sbromberger left a comment

Choose a reason for hiding this comment

sinhatushar commented Mar 24, 2019 •

edited

Loading

codecov bot commented Mar 24, 2019 •

edited

Loading

abhinavmehndiratta commented Mar 24, 2019 •

edited

Loading

sinhatushar commented Mar 24, 2019 •

edited

Loading