[DOCS][ML][SPARK-11964] Add in Pipeline Import/Export Documentation #10179

bllchmbrs · 2015-12-07T21:15:47Z

Adding in Pipeline Import and Export Documentation.

Needed to import the types specifically, not the general pyspark.sql

bllchmbrs · 2015-12-07T21:17:41Z

not sure if my notations are correct for the title @jkbradley, let me know if I need to change anything!

jkbradley · 2015-12-07T21:44:18Z

Can you please add "[ML]" to the PR title? And also edit the first comment in the PR to describe what the PR is doing (since it will become part of the commit message)? I'll review it now, thanks!

jkbradley · 2015-12-07T21:45:47Z

Oh also please remove "SPARK-6725" from the PR title. It's better to have a 1 JIRA - 1 PR correspondence.

BenFradet · 2015-12-07T22:13:08Z

docs/ml-guide.md

+## Example: Saving and Loading a Previously Created Model Pipeline
+
+Often times it is worth it to save a model to disk for usage later. In Spark 1.6, similar model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML Models such as:


"for later use" instead of "for usage later" sounds better to me.
"In Spark 1.6, a model import/export" instead of "similar"
"ML models"

BenFradet · 2015-12-07T22:13:41Z

A few minor comments on the documentation, but otherwise it lgtm.

jkbradley · 2015-12-07T22:32:43Z

This is really similar to the existing Pipeline examples...as it probably should be. However, that makes me wonder if the best way to do this would be to:

Add a very small text description subsection at the end of the "Main concepts in Pipelines" section.
Modify one of the existing code examples to include save/load.

That should simplify the PR a lot. Thanks!

bllchmbrs · 2015-12-07T23:42:38Z

@jkbradley will make those changes shortly.

@BenFradet will make those changes as well.

bllchmbrs · 2015-12-10T02:09:29Z

@jkbradley does this work for you by the way?

BenFradet · 2015-12-10T06:57:22Z

docs/ml-guide.md

+
+## Example: Saving and Loading a Pipeline
+
+Often times it is worth it to save a model to disk for later use. In Spark 1.6, model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML models such as:


a model import/export functionality...

BenFradet · 2015-12-10T06:58:43Z

LGTM except one minor comment.

bllchmbrs · 2015-12-10T07:06:58Z

@BenFradet integrated your feedback thanks.

jkbradley · 2015-12-10T18:38:48Z

@anabranch Hm, I may not have been clear enough. The save/load functionality seems general and important enough that it should go under the "Main concepts in Pipelines" section; I would put a subsection with a small paragraph (without code) at the end of the "Main concepts in Pipelines" section, just before the "Code example" section. I would then modify the first code example "Example: Estimator, Transformer, and Param" to include saving and loading the pipeline. Thanks!

bllchmbrs · 2015-12-11T05:34:11Z

@jkbradley gotcha! I misinterpreted your last comments, my fault.

One thing I'm confused about though is that the Estimator, Transformer, and Param section doesn't seem to mention pipelines explicitly, more focusing on those components themselves. It doesn't necessarily seem like the best place to talk about pipeline persistence? I'm sure there's just something that I'm missing.

I've updated to move it into the pipeline section but please let me know if you would like me to change it. to the params one and demo it more as a feature. Will be available tomorrow so I can turn it around quickly!

jkbradley · 2015-12-11T18:01:00Z

docs/ml-guide.md

@@ -140,8 +140,8 @@ If the `Pipeline` had more stages, it would call the `LogisticRegressionModel`'s
 method on the `DataFrame` before passing the `DataFrame` to the next stage.

 A `Pipeline` is an `Estimator`.
-Thus, after a `Pipeline`'s `fit()` method runs, it produces a `PipelineModel`, which is a
-`Transformer`.
+Thus, after a `Pipeline`'s `fit()` method runs, it produces a `PipelineModel`, which is a `Transformer`.


I would avoid unneeded changes like this since they can cause conflicts.

jkbradley · 2015-12-11T18:01:10Z

docs/ml-guide.md

@@ -471,6 +489,14 @@ model.transform(test)
    println(s"($id, $text) --> prob=$prob, prediction=$prediction")
  }

+// or use the "loadedModel" to make predictions


No need for this

jkbradley · 2015-12-11T18:01:35Z

Thanks for the updates! Just minor comments now.

I agree we need to improve and reorganize the sections explaining Pipelines; we're working on that in some separate PRs.

bllchmbrs · 2015-12-11T18:39:03Z

@jkbradley should be good to go! Sorry for being such a pain!

jkbradley · 2015-12-11T18:49:12Z

docs/ml-guide.md

+
+// and load it back in during production
+val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
+// or equivalently


Remove this and next line

I like the idea in this & the next line, but I'd remove this since it will cause problems if users blindly copy and paste the code into a spark shell.

Just to be crystal clear, you would like me to remove 470 - 472? The "equivalently" bit, right?

Yes (470-471). I'd keep the newline in 472. Thanks!

jkbradley · 2015-12-11T18:49:37Z

No, no problem. Just 1 more comment left

bllchmbrs · 2015-12-11T19:55:37Z

@jkbradley should be good to go now!

jkbradley · 2015-12-11T20:15:39Z

LGTM pending tests.
Thanks!

SparkQA · 2015-12-11T20:34:16Z

Test build #2206 has finished for PR 10179 at commit 10ccc22.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * class ExecutorClassLoader(\n

jkbradley · 2015-12-11T20:55:24Z

Merging with master and branch-1.6

Adding in Pipeline Import and Export Documentation. Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes #10179 from anabranch/master. (cherry picked from commit aa305dc) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

bllchmbrs and others added 5 commits March 24, 2015 20:28

[DOCUMENTATION]Fixed Missing Type Import in Documentation

603b080

Needed to import the types specifically, not the general pyspark.sql

Corrected SqlContext Import

8fa67bf

[SPARK-11964]pipeline persistence=>ml-guide

0c391d9

improved formatting, description

a6809bd

merge old commits

dca6688

BenFradet reviewed Dec 7, 2015
View reviewed changes

bllchmbrs changed the title ~~[DOCS][SPARK-11964][SPARK-6725] Add in Pipeline Import/Export Documentation~~ [DOCS][ML][SPARK-11964] Add in Pipeline Import/Export Documentation Dec 7, 2015

re-organization of docs + feedback

eb3f99c

BenFradet reviewed Dec 10, 2015
View reviewed changes

bllchmbrs added 2 commits December 9, 2015 23:03

merge upstream

889437c

feedback from @BenFradet

bde19ad

bllchmbrs added 3 commits December 10, 2015 21:24

update to the formatting

e1f27eb

updated formatting, better location

4a3b513

merged upstream

8f4c922

jkbradley reviewed Dec 11, 2015
View reviewed changes

bllchmbrs added 8 commits December 11, 2015 10:16

small changes for @jkbradley

ad715ba

small change to revert

66f692f

small revert

2d0d87c

small revert

da91fb2

small revert again

c596edf

small changes, little by little!

9c7ad99

merge upstream

86c9658

revert end of page

06f8c2a

jkbradley reviewed Dec 11, 2015
View reviewed changes

removed 470 and 471

10ccc22

asfgit closed this in aa305dc Dec 11, 2015

		## Example: Saving and Loading a Previously Created Model Pipeline

		Often times it is worth it to save a model to disk for usage later. In Spark 1.6, similar model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML Models such as:


		## Example: Saving and Loading a Pipeline

		Often times it is worth it to save a model to disk for later use. In Spark 1.6, model import/export functionality was added to the Pipeline API. Most basic transformers are supported as well as some of the more basic ML models such as:

[DOCS][ML][SPARK-11964] Add in Pipeline Import/Export Documentation #10179

[DOCS][ML][SPARK-11964] Add in Pipeline Import/Export Documentation #10179

Uh oh!

Conversation

bllchmbrs commented Dec 7, 2015

Uh oh!

bllchmbrs commented Dec 7, 2015

Uh oh!

jkbradley commented Dec 7, 2015

Uh oh!

jkbradley commented Dec 7, 2015

Uh oh!

BenFradet Dec 7, 2015

Choose a reason for hiding this comment

Uh oh!

BenFradet commented Dec 7, 2015

Uh oh!

jkbradley commented Dec 7, 2015

Uh oh!

bllchmbrs commented Dec 7, 2015

Uh oh!

bllchmbrs commented Dec 10, 2015

Uh oh!

BenFradet Dec 10, 2015

Choose a reason for hiding this comment

Uh oh!

BenFradet commented Dec 10, 2015

Uh oh!

bllchmbrs commented Dec 10, 2015

Uh oh!

jkbradley commented Dec 10, 2015

Uh oh!

bllchmbrs commented Dec 11, 2015

Uh oh!

jkbradley Dec 11, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley Dec 11, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Dec 11, 2015

Uh oh!

bllchmbrs commented Dec 11, 2015

Uh oh!

jkbradley Dec 11, 2015

Choose a reason for hiding this comment

Uh oh!

bllchmbrs Dec 11, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley Dec 11, 2015

Choose a reason for hiding this comment

Uh oh!

jkbradley commented Dec 11, 2015

Uh oh!

bllchmbrs commented Dec 11, 2015

Uh oh!

jkbradley commented Dec 11, 2015

Uh oh!

SparkQA commented Dec 11, 2015

Uh oh!

jkbradley commented Dec 11, 2015

Uh oh!

Uh oh!