Contextless ML implementation of Spark ML.
To serve small ML pipelines there is no need to create SparkContext and use cluster-related features.
In this project we made our implementations for ML Transformers. Some of them call context-independent Spark methods.
Instead of using DataFrames, we implemented simple LocalData class to get rid of SparkContext.
All Transformers are rewritten to accept LocalData.
- Import this project as dependency:
scalaVersion := "2.11.8"
// Artifact name is depends of what version of spark are you usng for model training:
// spark 2.0.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_0" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.0.2"
)
// spark 2.1.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_1" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.1.2"
)
// spark 2.2.x
libraryDependencies += Seq(
"io.hydrosphere" %% "spark-ml-serving-2_2" % "0.3.0",
"org.apache.spark" %% "spark-mllib" % "2.2.0"
)- Use it: example
import io.hydrosphere.spark_ml_serving._
import LocalPipelineModel._
// ....
val model = LocalPipelineModel.load("PATH_TO_MODEL") // Load
val columns = List(LocalDataColumn("text", Seq("Hello!")))
val localData = LocalData(columns)
val result = model.transform(localData) // Transformed resultMore examples of different ML models are in tests.