Skip to content

Commit

Permalink
combineByKey examples and notes added
Browse files Browse the repository at this point in the history
  • Loading branch information
pyspark-in-action committed Apr 21, 2016
1 parent 4e3c148 commit 10f2f3e
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions tutorial/combine-by-key/spark-combineByKey.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,3 +199,9 @@ So, you should get back an array something like this:
Array( [A, (15., 3)], [B, (30., 2)], [Z, (28., 4)])
````


sumCount = rdd.combineByKey(lambda value: (value, value*value, 1),
lambda x, value: (x[0] + value, x[1] + value*value, x[2] + 1),
lambda x, y: (x[0] + y[0], x[1] + y[1], x[2] + y[2])
)

0 comments on commit 10f2f3e

Please sign in to comment.