Skip to content

Commit aa305dc

Browse files
bllchmbrsjkbradley
authored andcommitted
[SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation
Adding in Pipeline Import and Export Documentation. Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #10179 from anabranch/master.
1 parent 0fb9825 commit aa305dc

File tree

1 file changed

+13
-0
lines changed

1 file changed

+13
-0
lines changed

docs/ml-guide.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,10 @@ Parameters belong to specific instances of `Estimator`s and `Transformer`s.
192192
For example, if we have two `LogisticRegression` instances `lr1` and `lr2`, then we can build a `ParamMap` with both `maxIter` parameters specified: `ParamMap(lr1.maxIter -> 10, lr2.maxIter -> 20)`.
193193
This is useful if there are two algorithms with the `maxIter` parameter in a `Pipeline`.
194194

195+
## Saving and Loading Pipelines
196+
197+
Often times it is worth it to save a model or a pipeline to disk for later use. In Spark 1.6, a model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML models. Please refer to the algorithm's API documentation to see if saving and loading is supported.
198+
195199
# Code examples
196200

197201
This section gives code examples illustrating the functionality discussed above.
@@ -455,6 +459,15 @@ val pipeline = new Pipeline()
455459
// Fit the pipeline to training documents.
456460
val model = pipeline.fit(training)
457461

462+
// now we can optionally save the fitted pipeline to disk
463+
model.save("/tmp/spark-logistic-regression-model")
464+
465+
// we can also save this unfit pipeline to disk
466+
pipeline.save("/tmp/unfit-lr-model")
467+
468+
// and load it back in during production
469+
val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
470+
458471
// Prepare test documents, which are unlabeled (id, text) tuples.
459472
val test = sqlContext.createDataFrame(Seq(
460473
(4L, "spark i j k"),

0 commit comments

Comments
 (0)