Skip to content

Commit 15a5645

Browse files
ankurdaverxin
authored andcommitted
[SPARK-3427] [GraphX] Avoid active vertex tracking in static PageRank
GraphX's current implementation of static (fixed iteration count) PageRank uses the Pregel API. This unnecessarily tracks active vertices, even though in static PageRank all vertices are always active. Active vertex tracking incurs the following costs: 1. A shuffle per iteration to ship the active sets to the edge partitions. 2. A hash table creation per iteration at each partition to index the active sets for lookup. 3. A hash lookup per edge to check whether the source vertex is active. I reimplemented static PageRank using the lower-level GraphX API instead of the Pregel API. In benchmarks on a 16-node m2.4xlarge cluster, this provided a 23% speedup (from 514 s to 397 s, mean over 3 trials) for 10 iterations of PageRank on a synthetic graph with 10M vertices and 1.27B edges. Author: Ankur Dave <ankurdave@gmail.com> Closes #2308 from ankurdave/SPARK-3427 and squashes the following commits: 449996a [Ankur Dave] Avoid unnecessary active vertex tracking in static PageRank
1 parent eae81b0 commit 15a5645

File tree

1 file changed

+29
-16
lines changed

1 file changed

+29
-16
lines changed

graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -79,30 +79,43 @@ object PageRank extends Logging {
7979
def run[VD: ClassTag, ED: ClassTag](
8080
graph: Graph[VD, ED], numIter: Int, resetProb: Double = 0.15): Graph[Double, Double] =
8181
{
82-
// Initialize the pagerankGraph with each edge attribute having
82+
// Initialize the PageRank graph with each edge attribute having
8383
// weight 1/outDegree and each vertex with attribute 1.0.
84-
val pagerankGraph: Graph[Double, Double] = graph
84+
var rankGraph: Graph[Double, Double] = graph
8585
// Associate the degree with each vertex
8686
.outerJoinVertices(graph.outDegrees) { (vid, vdata, deg) => deg.getOrElse(0) }
8787
// Set the weight on the edges based on the degree
8888
.mapTriplets( e => 1.0 / e.srcAttr )
8989
// Set the vertex attributes to the initial pagerank values
90-
.mapVertices( (id, attr) => 1.0 )
91-
.cache()
90+
.mapVertices( (id, attr) => resetProb )
9291

93-
// Define the three functions needed to implement PageRank in the GraphX
94-
// version of Pregel
95-
def vertexProgram(id: VertexId, attr: Double, msgSum: Double): Double =
96-
resetProb + (1.0 - resetProb) * msgSum
97-
def sendMessage(edge: EdgeTriplet[Double, Double]) =
98-
Iterator((edge.dstId, edge.srcAttr * edge.attr))
99-
def messageCombiner(a: Double, b: Double): Double = a + b
100-
// The initial message received by all vertices in PageRank
101-
val initialMessage = 0.0
92+
var iteration = 0
93+
var prevRankGraph: Graph[Double, Double] = null
94+
while (iteration < numIter) {
95+
rankGraph.cache()
10296

103-
// Execute pregel for a fixed number of iterations.
104-
Pregel(pagerankGraph, initialMessage, numIter, activeDirection = EdgeDirection.Out)(
105-
vertexProgram, sendMessage, messageCombiner)
97+
// Compute the outgoing rank contributions of each vertex, perform local preaggregation, and
98+
// do the final aggregation at the receiving vertices. Requires a shuffle for aggregation.
99+
val rankUpdates = rankGraph.mapReduceTriplets[Double](
100+
e => Iterator((e.dstId, e.srcAttr * e.attr)), _ + _)
101+
102+
// Apply the final rank updates to get the new ranks, using join to preserve ranks of vertices
103+
// that didn't receive a message. Requires a shuffle for broadcasting updated ranks to the
104+
// edge partitions.
105+
prevRankGraph = rankGraph
106+
rankGraph = rankGraph.joinVertices(rankUpdates) {
107+
(id, oldRank, msgSum) => resetProb + (1.0 - resetProb) * msgSum
108+
}.cache()
109+
110+
rankGraph.edges.foreachPartition(x => {}) // also materializes rankGraph.vertices
111+
logInfo(s"PageRank finished iteration $iteration.")
112+
prevRankGraph.vertices.unpersist(false)
113+
prevRankGraph.edges.unpersist(false)
114+
115+
iteration += 1
116+
}
117+
118+
rankGraph
106119
}
107120

108121
/**

0 commit comments

Comments
 (0)