Skip to content

Commit 51ade51

Browse files
committed
[SPARK-16440][MLLIB] Undeleted broadcast variables in Word2Vec causing OoM for long runs
## What changes were proposed in this pull request? Unpersist broadcasted vars in Word2Vec.fit for more timely / reliable resource cleanup ## How was this patch tested? Jenkins tests Author: Sean Owen <sowen@cloudera.com> Closes apache#14153 from srowen/SPARK-16440.
1 parent 3d6f679 commit 51ade51

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -434,6 +434,9 @@ class Word2Vec extends Serializable with Logging {
434434
bcSyn1Global.unpersist(false)
435435
}
436436
newSentences.unpersist()
437+
expTable.unpersist()
438+
bcVocab.unpersist()
439+
bcVocabHash.unpersist()
437440

438441
val wordArray = vocab.map(_.word)
439442
new Word2VecModel(wordArray.zipWithIndex.toMap, syn0Global)

0 commit comments

Comments
 (0)