-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-17508][PYSPARK][ML] PySpark treat Param values None same as not setting the Param #15113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-17508][PYSPARK][ML] PySpark treat Param values None same as not setting the Param #15113
Conversation
@srowen Here is the change that would treat PySpark Params with a value of |
Test build #65458 has finished for PR 15113 at commit
|
@BryanCutler Thanks for working on this. I'm a bit worried that if users set LogisticRegression(maxIter=5, regParam=0.0, weightCol="weight")
LogisticRegression(maxIter=5, regParam=0.0, weightCol="")
LogisticRegression(maxIter=5, regParam=0.0) And users are unlikely to write the following code:
However, if users actually do this, I'm more likely to consider that he will forcibly specify |
Further more, |
Hi @yanboliang , thanks for taking a look!
Here it shows |
@yanboliang : I've certainly seen even Spark developers be tripped up with the handling of |
@BryanCutler Hi, are you still working on this? |
Hi @ueshin , there isn't anymore work for this, just a decision if this behavior should change. I still think it should and would like to hear some more opinions before I close this. Any thoughts @holdenk @jkbradley ? |
So I think our current handling of |
Is this PR still needed? |
I still think this makes sense, but maybe I'm the minority. I'll go ahead and close it unless anyone else thinks it should be changed. |
What changes were proposed in this pull request?
In PySpark, if params are set with a value of
None
it will get converted by Py4J to a typed value and usually result in an error. For the case of string params, it will assign them anull
value and lead to ajava.lang.NullPointerException
. From the user perspective, this is not the expected outcome and can be confusing.This change causes PySpark to not set Params with a value of
None
, and they will be treated the same as not being set. If a Param is already set, then attempting to set a value ofNone
will have no effect. This is mostly useful for a user that could be automating Param settings or wrapping classes that use Params.How was this patch tested?
Added new unit tests for setting a Param to
None
and ran existing PySpark-ml tests.