Skip to content

Commit 8c2bf64

Browse files
amanomersrowen
authored andcommitted
[SPARK-29823][MLLIB] Improper persist strategy in mllib.clustering.KMeans.run()
### What changes were proposed in this pull request? Adjust RDD to persist. ### Why are the changes needed? To handle the improper persist strategy. ### Does this PR introduce any user-facing change? No ### How was this patch tested? Manually Closes #26483 from amanomer/SPARK-29823. Authored-by: Aman Omer <amanomer1996@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
1 parent 4dcbdcd commit 8c2bf64

File tree

1 file changed

+2
-2
lines changed
  • mllib/src/main/scala/org/apache/spark/mllib/clustering

1 file changed

+2
-2
lines changed

mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -223,12 +223,12 @@ class KMeans private (
223223

224224
// Compute squared norms and cache them.
225225
val norms = data.map(Vectors.norm(_, 2.0))
226-
norms.persist()
227226
val zippedData = data.zip(norms).map { case (v, norm) =>
228227
new VectorWithNorm(v, norm)
229228
}
229+
zippedData.persist()
230230
val model = runAlgorithm(zippedData, instr)
231-
norms.unpersist()
231+
zippedData.unpersist()
232232

233233
// Warn at the end of the run as well, for increased visibility.
234234
if (data.getStorageLevel == StorageLevel.NONE) {

0 commit comments

Comments
 (0)