Skip to content

Commit dd8514f

Browse files
yinxusenyanboliang
authored andcommitted
[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector
## What changes were proposed in this pull request? mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format. ## How was this patch tested? Test manually. Author: Xusen Yin <yinxusen@gmail.com> Closes #14212 from yinxusen/SPARK-16558.
1 parent d9e0919 commit dd8514f

File tree

1 file changed

+3
-2
lines changed

1 file changed

+3
-2
lines changed

examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,9 @@ import scopt.OptionParser
2424
import org.apache.spark.{SparkConf, SparkContext}
2525
import org.apache.spark.ml.Pipeline
2626
import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel, RegexTokenizer, StopWordsRemover}
27+
import org.apache.spark.ml.linalg.{Vector => MLVector}
2728
import org.apache.spark.mllib.clustering.{DistributedLDAModel, EMLDAOptimizer, LDA, OnlineLDAOptimizer}
28-
import org.apache.spark.mllib.linalg.Vector
29+
import org.apache.spark.mllib.linalg.{Vector, Vectors}
2930
import org.apache.spark.rdd.RDD
3031
import org.apache.spark.sql.{Row, SparkSession}
3132

@@ -223,7 +224,7 @@ object LDAExample {
223224
val documents = model.transform(df)
224225
.select("features")
225226
.rdd
226-
.map { case Row(features: Vector) => features }
227+
.map { case Row(features: MLVector) => Vectors.fromML(features) }
227228
.zipWithIndex()
228229
.map(_.swap)
229230

0 commit comments

Comments
 (0)