Skip to content

[SPARK-21915][ML][PySpark]Model 1 and Model 2 ParamMaps Missing #19126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

[SPARK-21915][ML][PySpark]Model 1 and Model 2 ParamMaps Missing #19126

wants to merge 1 commit into from

Conversation

marktab
Copy link

@marktab marktab commented Sep 5, 2017

@dongjoon-hyun @HyukjinKwon

Error in PySpark example code:
[https://github.com/apache/spark/blob/master/examples/src/main/python/ml/estimator_transformer_param_example.py]

The original Scala code says
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

This code has been tested in Python, and returns values consistent with Scala

What changes were proposed in this pull request?

Proposing to call the lr variable instead of model1 or model2

How was this patch tested?

This patch was tested with Spark 2.1.0 comparing the Scala and PySpark results. Pyspark returns nothing at present for those two print lines.

The output for model2 in PySpark should be

{Param(parent='LogisticRegression_4187be538f744d5a9090', name='tol', doc='the convergence tolerance for iterative algorithms (>= 0).'): 1e-06,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='elasticNetParam', doc='the ElasticNet mixing parameter, in range [0, 1]. For alpha = 0, the penalty is an L2 penalty. For alpha = 1, it is an L1 penalty.'): 0.0,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='predictionCol', doc='prediction column name.'): 'prediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='featuresCol', doc='features column name.'): 'features',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='labelCol', doc='label column name.'): 'label',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='probabilityCol', doc='Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities.'): 'myProbability',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='rawPredictionCol', doc='raw prediction (a.k.a. confidence) column name.'): 'rawPrediction',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='family', doc='The name of family which is a description of the label distribution to be used in the model. Supported options: auto, binomial, multinomial'): 'auto',
Param(parent='LogisticRegression_4187be538f744d5a9090', name='fitIntercept', doc='whether to fit an intercept term.'): True,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='threshold', doc='Threshold in binary classification prediction, in range [0, 1]. If threshold and thresholds are both set, they must match.e.g. if threshold is p, then thresholds must be equal to [1-p, p].'): 0.55,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='aggregationDepth', doc='suggested depth for treeAggregate (>= 2).'): 2,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='maxIter', doc='max number of iterations (>= 0).'): 30,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='regParam', doc='regularization parameter (>= 0).'): 0.1,
Param(parent='LogisticRegression_4187be538f744d5a9090', name='standardization', doc='whether to standardize the training features before fitting the model.'): True}

Please review http://spark.apache.org/contributing.html before opening a pull request.

The original Scala code says
println("Model 2 was fit using parameters: " + model2.parent.extractParamMap)

The parent is lr

There is no method for accessing parent as is done in Scala.

This code has been tested in Python, and returns values consistent with Scala
@marktab marktab changed the title Model 1 and Model 2 ParamMaps Missing [ML][PySpark]Model 1 and Model 2 ParamMaps Missing Sep 5, 2017
@marktab marktab changed the title [ML][PySpark]Model 1 and Model 2 ParamMaps Missing [SPARK-19126][ML][PySpark]Model 1 and Model 2 ParamMaps Missing Sep 5, 2017
@marktab marktab changed the title [SPARK-19126][ML][PySpark]Model 1 and Model 2 ParamMaps Missing [SPARK-xxxx][ML][PySpark]Model 1 and Model 2 ParamMaps Missing Sep 5, 2017
@marktab marktab changed the title [SPARK-xxxx][ML][PySpark]Model 1 and Model 2 ParamMaps Missing [SPARK-21915][ML][PySpark]Model 1 and Model 2 ParamMaps Missing Sep 5, 2017
@HyukjinKwon
Copy link
Member

ok to test

@SparkQA
Copy link

SparkQA commented Sep 5, 2017

Test build #81413 has finished for PR 19126 at commit 76e5da7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Sep 6, 2017

So I'm wondering if @BryanCutler's change for copying the params means that this should only be applied to old branches (2.2)?

@BryanCutler
Copy link
Member

Yeah, I checked and this is not a problem in master since #17849

@srowen
Copy link
Member

srowen commented Sep 6, 2017

@marktab you can either close this or reopen vs branch-2.2

@marktab
Copy link
Author

marktab commented Sep 6, 2017

Thanks @srowen

@marktab marktab closed this Sep 6, 2017
@marktab
Copy link
Author

marktab commented Sep 6, 2017

@srowen since I am a new to this review process, should I be seeing the change at http://spark.apache.org/docs/latest/ml-pipeline.html

@srowen
Copy link
Member

srowen commented Sep 6, 2017

No, no change has been made. @BryanCutler suggests that the change isn't needed in master but could be applied to branch-2.2. To do that you need to reopen a new PR against that branch. It still wouldn't make any existing docs update, but would affect the next release and its docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants