GitHub - puzzlef/louvain-communities-openmp at 3ed9cebabebdd23ed2ac9fab69436e63f2f35f14

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.cxx		main.cxx
main.sh		main.sh
process.js		process.js
system.txt		system.txt

Repository files navigation

Multi-threaded OpenMP-based Louvain algorithm for community detection.

The Louvain method is a greedy modularity optimization based agglomerative algorithm by Blondel et al. that efficiently identifies high-quality communities in a graph. With O(n \log n) average time complexity (n being the number of vertices in the graph), it consists of two phases: local-moving and aggregation. In the local-moving phase, each vertex greedily selects the neighboring community with the highest modularity increase. The aggregation phase merges vertices within a community into a super-vertex. This iterative process produces a dendrogram hierarchy of community memberships, with the top-level representing the final output.

Optimizations

OpenMP schedule

It appears schedule(dynamic, 2048) is the best choice.

Limiting number of iterations

It appears allowing a maximum of 20 iterations is ok.

Adjusting tolerance drop rate (threshold scaling)

Instead of using a fixed tolerance across all passes of the Louvain algorithm, we can start with a high tolerance and then gradually reduce it. This is called threshold scaling, and it helps minimize runtime as the first pass is usually the most expensive. It appears a drop rate of 10 is ok.

Adjusting (initial) tolerance

As mentioned above, we can start with an initial high tolerance. It appears a tolerance of 0.01 is suitable.

Adjusting aggregation tolerance

This controls when to consider communities to have converged based on the number of community merges. That is if too few communties merged this pass, we should stop here. Or, |Vsuper-vertices|/|V| >= aggegation tolerance => Converged.

It appears an aggregation tolerance of 0.8 might be the best choice.

Vertex pruning

When a vertex changes its community, its marks its neighbors to be processed. Once a vertex has been processed, it is marked as not to be processed. It helps minimize unnecessary computation.

It appears with vertex pruning we get better timings.

Finding community vertices for aggregation phase

I originally used a simple vector2d for storing vertices belonging to each community. But this requires memory allocation during the algorithm, which is expensive. Using a prefix sum along with a preallocated CSR helps to to avoid repeated mallocs and frees.

It appears using prefix sum is quite faster than using vector2d.

Storing aggregated communities (super-vertex graph)

I also originally used a vector2d based dynamic graph data structure for storing the super-vertex graph. Again, this requires memory allocation during the algorithm. Using two preallocated CSRs along with parallel prefix sum can help here.

It appears using preallocated CSRs is faster.

Hashtable design for local-moving/aggregation phases

One can use std::maps (C++ inbuilt map) as the hastable for Louvain. But this has poor performance. So i use a key-list and a full-size values vector (collision-free) we can dramatically improve performance. However if the memory addresses of the hastables are nearby, even if each thread uses its own hash table performance is not so high possibly due to false cache-sharing (Close-KV). However if i ensure memory address are farther away, the perf. improves (Far-KV).

It seems Far-KV has the best performance.

Main results

We combine the above optimizations and observe the performance of OpenMP-based Louvain on a number of graphs. Observe that Static Louvain completes in 6.2 s on sk-2005 (a graph with 3.8 billion edges).

Larger number of iterations/passes are required for graphs with lower average degree (road networks such as asia_osm and europe_osm, and protein k-mer graphs such as kmer_A2a and kmer_V1r).

Also note how graphs with poor community structure (such as com-LiveJournal, com-Orkut) have a larger time / (n log n) factor.

Finally note how most of the runtime of the algorithm is spent in the local-moving phase. Further, most of the runtime is spent in the first pass of the algorithm, which is the most expensive pass due to the size of the original graph (later passes work on super-vertex graphs).

The input data used for this experiment is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizations

OpenMP schedule

Limiting number of iterations

Adjusting tolerance drop rate (threshold scaling)

Adjusting (initial) tolerance

Adjusting aggregation tolerance

Vertex pruning

Finding community vertices for aggregation phase

Storing aggregated communities (super-vertex graph)

Hashtable design for local-moving/aggregation phases

Main results

References

About

Languages

License

puzzlef/louvain-communities-openmp

Folders and files

Latest commit

History

Repository files navigation

Optimizations

OpenMP schedule

Limiting number of iterations

Adjusting tolerance drop rate (threshold scaling)

Adjusting (initial) tolerance

Adjusting aggregation tolerance

Vertex pruning

Finding community vertices for aggregation phase

Storing aggregated communities (super-vertex graph)

Hashtable design for local-moving/aggregation phases

Main results

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages