Contextless ML implementation of Spark ML.
To serve small ML pipelines there is no need to create SparkContext
and use cluster-related features.
In this project we made our implementations for ML Transformer
s. Some of them call context-independent Spark methods.
Instead of using DataFrame
s, we implemented simple LocalData
class to get rid of SparkContext
.
All Transformer
s are rewritten to accept LocalData
.
- Import this project as dependency:
scalaVersion := "2.11.8"
// Artifact name is depends of what version of spark are you usng for model training:
// spark 2.0.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_0" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.0.2"
)
// spark 2.1.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_1" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.1.2"
)
// spark 2.2.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_2" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.2.0"
)
- Use it: example
import io.hydrosphere.spark_ml_serving._
import LocalPipelineModel._
// ....
val model = LocalPipelineModel.load("PATH_TO_MODEL") // Load
val columns = List(LocalDataColumn("text", Seq("Hello!")))
val localData = LocalData(columns)
val result = model.transform(localData) // Transformed result
More examples of different ML models are in tests.