Skip to content

Commit 4381e21

Browse files
committed
[SPARK-16440][MLLIB] Undeleted broadcast variables in Word2Vec causing OoM for long runs
## What changes were proposed in this pull request? Unpersist broadcasted vars in Word2Vec.fit for more timely / reliable resource cleanup ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes #14153 from srowen/SPARK-16440. (cherry picked from commit 51ade51) Signed-off-by: Sean Owen <sowen@cloudera.com>
1 parent fb09336 commit 4381e21

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,9 @@ class Word2Vec extends Serializable with Logging {
416416
}
417417
}
418418
newSentences.unpersist()
419+
expTable.unpersist()
420+
bcVocab.unpersist()
421+
bcVocabHash.unpersist()
419422

420423
val wordArray = vocab.map(_.word)
421424
new Word2VecModel(wordArray.zipWithIndex.toMap, syn0Global)

0 commit comments

Comments
 (0)