Skip to content
This repository has been archived by the owner on Oct 8, 2021. It is now read-only.

Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph #1182

Merged
merged 10 commits into from
Mar 27, 2019

Conversation

sinhatushar
Copy link
Contributor

@sinhatushar sinhatushar commented Mar 24, 2019

Making some small changes makes the current implementation of strongly_connected_components() 4-5 times faster and requires 500 times lesser memory.

Acknowledgement : @simonschoelly helped me in figuring out what is wrong with the current implementation.

Some benchmarks :

julia> eg = SimpleDiGraph{Int32}(loadsnap(:ego_twitter_d));

##Current implementation
julia> @benchmark strongly_connected_components(eg)
BenchmarkTools.Trial: 
  memory estimate:  2.30 GiB
  allocs estimate:  36776
  --------------
  minimum time:     102.059 ms (59.44% GC)
  median time:      115.630 ms (60.22% GC)
  mean time:        123.090 ms (62.91% GC)
  maximum time:     214.746 ms (78.27% GC)
  --------------
  samples:          41
  evals/sample:     1

##Proposed implementation
julia> @benchmark strongly_connected_components(eg)
BenchmarkTools.Trial: 
  memory estimate:  4.82 MiB
  allocs estimate:  24598
  --------------
  minimum time:     26.544 ms (0.00% GC)
  median time:      27.205 ms (0.00% GC)
  mean time:        27.964 ms (1.41% GC)
  maximum time:     36.095 ms (6.22% GC)
  --------------
  samples:          179
  evals/sample:     1

julia> ss = SimpleDiGraph{Int32}(loadsnap(:soc_slashdot0902_d));

##Current implementation
julia> @benchmark strongly_connected_components(ss)
BenchmarkTools.Trial: 
  memory estimate:  1.70 GiB
  allocs estimate:  31705
  --------------
  minimum time:     116.306 ms (54.33% GC)
  median time:      122.452 ms (54.30% GC)
  mean time:        130.374 ms (56.26% GC)
  maximum time:     214.087 ms (72.84% GC)
  --------------
  samples:          39
  evals/sample:     1


##Proposed implementation
julia> @benchmark strongly_connected_components(ss)
BenchmarkTools.Trial: 
  memory estimate:  4.52 MiB
  allocs estimate:  21190
  --------------
  minimum time:     34.608 ms (0.00% GC)
  median time:      35.258 ms (0.00% GC)
  mean time:        36.028 ms (0.86% GC)
  maximum time:     40.149 ms (4.55% GC)
  --------------
  samples:          139
  evals/sample:     1

@codecov
Copy link

codecov bot commented Mar 24, 2019

Codecov Report

Merging #1182 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #1182      +/-   ##
==========================================
- Coverage   99.62%   99.62%   -0.01%     
==========================================
  Files          93       93              
  Lines        4218     4217       -1     
==========================================
- Hits         4202     4201       -1     
  Misses         16       16

@sinhatushar sinhatushar changed the title Time and memory efficient Tarjan's algorithm for strongly connected components in a connected graph Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph Mar 24, 2019
@abhinavmehndiratta
Copy link
Contributor

abhinavmehndiratta commented Mar 24, 2019

@sinhatushar is this implementation now faster than #1173 ?

@sinhatushar
Copy link
Contributor Author

@sinhatushar is this implementation now faster than #1173 ?

Yes, this implementation is slightly faster than #1173. Tarjan's algorithm in theory is faster than Kosaraju's algorithm. However the current implementation(Tarjan's algorithm) wasn't working as fast as expected which has been fixed in this PR.
But Kosaraju algorithm (#1173) can be an addition to growing pool of algortihms in LightGraphs.jl

@simonschoelly
Copy link
Contributor

julia> eg = squash(loadsnap(:ego_twitter_d))
{81306, 1768149} directed simple UInt32 graph

julia> @btime strongly_connected_components_kosaraju($eg);
  27.307 ms (24577 allocations: 3.20 MiB)

julia> @btime strongly_connected_components($eg);
  22.376 ms (36847 allocations: 6.21 MiB)

Slightly faster

index = zeros(T, nvg) # first time in which vertex is discovered
stack = Vector{T}() # stores vertices which have been discovered and not yet assigned to any component
stack = T[] # stores vertices which have been discovered and not yet assigned to any component
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this convention in LightGraphs, that for creating empty arrays of a certain type we use

Vector{T}()    

The outcome is the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.


for s in vertices(g)

dfs_stack = T[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link
Contributor

@simonschoelly simonschoelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still allocate a vector twice for each component. Once when creating component and then another time, when we use reverse.

@sinhatushar
Copy link
Contributor Author

sinhatushar commented Mar 24, 2019

We still allocate a vector twice for each component. Once when creating component and then another time, when we use reverse.

I was willing to remove reverse but when I ran the tests, I saw that some tests for functions which use
strongly_connected_components() failed .
I am changing reverse() to reverse!() to remove double memory allocation for each component.

src/connectivity.jl Outdated Show resolved Hide resolved
julia> g=SimpleDiGraph(11)
{11, 0} directed simple Int64 graph

julia> edge_list=[(1,2),(2,3),(3,4),(4,1),(3,5),(5,6),(6,7),(7,5),(5,8),(8,9),(9,8),(10,11),(11,10)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you write julia> edge_list=[(1,2),(2,3),(3,4),(4,1),(3,5),(5,6),(6,7),(7,5),(5,8),(8,9),(9,8),(10,11),(11,10)]; (with a semicolon instead), then you won't get any output, so maybe we can shorten the unnecessary long output in this example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@simonschoelly
Copy link
Contributor

This looks good to me, of course there would still be some ways to get some minor performance improvements, but the mayor bottleneck has been removed.

@sinhatushar
Copy link
Contributor Author

@sbromberger , waiting for you to merge this if you find this PR suitable. @simonschoelly has approved these changes.

@@ -178,6 +178,9 @@ true
"""
is_weakly_connected(g) = is_connected(g)




Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need all this whitespace here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

index = zeros(T, nvg) # first time in which vertex is discovered
stack = Vector{T}() # stores vertices which have been discovered and not yet assigned to any component
stack = Vector{T}() # stores vertices which have been discovered and not yet assigned to any component
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@@ -263,7 +285,7 @@ function strongly_connected_components end
break
end
end
push!(components, reverse(component))
push!(components, reverse!(component))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a hack. It relies on the fact that reverse! returns the modified vector, but strictly speaking, it doesn't HAVE to since it's a mutating function. Better to call reverse! component on the line before and use component here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Copy link
Owner

@sbromberger sbromberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!!

@sbromberger sbromberger merged commit 1e47426 into sbromberger:master Mar 27, 2019
sbromberger pushed a commit that referenced this pull request Mar 27, 2019
…omponents in a directed graph (#1182)

* Make changes to make the algorithm more efficient.

* Add one more example.

* Remove unnecessary comments.

* Use Vector{T}() instead of T[] .

* Use reverse!() instead of reverse().

* Fix typo and remove redundant empty line

* Shorten the example.

* Make minor changes.

* Final changes.
@sinhatushar sinhatushar deleted the tarjan branch March 27, 2019 19:26
sbromberger added a commit that referenced this pull request Apr 12, 2019
* first examples

* Improve code coverage (#1183)

* ✅ Improve test coverage of euclidean_graph

* ✅ Improve test coverage for cycle_basis

* ✅ Improve test coverage of graphmatrices.jl

* 🐛 Fix some bugs in tests

* 📝 Add some examples for random graph generators (#1180)

#1142

* Time and memory efficient Tarjan's algorithm for strongly connected components in a directed graph  (#1182)

* Make changes to make the algorithm more efficient.

* Add one more example.

* Remove unnecessary comments.

* Use Vector{T}() instead of T[] .

* Use reverse!() instead of reverse().

* Fix typo and remove redundant empty line

* Shorten the example.

* Make minor changes.

* Final changes.

* randgraphs

* smallgraphs

* simpleedge
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants