Skip to content

Commit 3bce43f

Browse files
zsxwingJoshRosen
authored andcommitted
[SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join
In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as, ```Scala val iterable = Seq(1, 2, 3).map(v => { println(v) v }) println("Iterable map done") val iterator = Seq(1, 2, 3).iterator.map(v => { println(v) v }) println("Iterator map done") ``` outputed ``` 1 2 3 Iterable map done Iterator map done ``` So we should use 'iterator' to reduce memory consumed by join. Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E Author: zsxwing <zsxwing@gmail.com> Closes #3671 from zsxwing/SPARK-4824 and squashes the following commits: 48ee7b9 [zsxwing] Remove the explicit types 95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join (cherry picked from commit c233ab3) Signed-off-by: Josh Rosen <joshrosen@databricks.com> Conflicts: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
1 parent e5f2752 commit 3bce43f

File tree

1 file changed

+5
-5
lines changed

1 file changed

+5
-5
lines changed

core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -472,7 +472,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
472472
*/
473473
def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))] = {
474474
this.cogroup(other, partitioner).flatMapValues( pair =>
475-
for (v <- pair._1; w <- pair._2) yield (v, w)
475+
for (v <- pair._1.iterator; w <- pair._2.iterator) yield (v, w)
476476
)
477477
}
478478

@@ -485,9 +485,9 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
485485
def leftOuterJoin[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, Option[W]))] = {
486486
this.cogroup(other, partitioner).flatMapValues { pair =>
487487
if (pair._2.isEmpty) {
488-
pair._1.map(v => (v, None))
488+
pair._1.iterator.map(v => (v, None))
489489
} else {
490-
for (v <- pair._1; w <- pair._2) yield (v, Some(w))
490+
for (v <- pair._1.iterator; w <- pair._2.iterator) yield (v, Some(w))
491491
}
492492
}
493493
}
@@ -502,9 +502,9 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
502502
: RDD[(K, (Option[V], W))] = {
503503
this.cogroup(other, partitioner).flatMapValues { pair =>
504504
if (pair._1.isEmpty) {
505-
pair._2.map(w => (None, w))
505+
pair._2.iterator.map(w => (None, w))
506506
} else {
507-
for (v <- pair._1; w <- pair._2) yield (Some(v), w)
507+
for (v <- pair._1.iterator; w <- pair._2.iterator) yield (Some(v), w)
508508
}
509509
}
510510
}

0 commit comments

Comments
 (0)