-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-32310][ML][PySpark][3.0] ML params default value parity #29159
Conversation
Test build #126152 has finished for PR 29159 at commit
|
Test build #126158 has finished for PR 29159 at commit
|
retest this please |
Test build #126176 has finished for PR 29159 at commit
|
Jenkins retest this please |
Test build #126189 has finished for PR 29159 at commit
|
retest this please |
Test build #126192 has finished for PR 29159 at commit
|
retest this please |
Test build #126193 has finished for PR 29159 at commit
|
retest this please |
Test build #126201 has finished for PR 29159 at commit
|
retest this please |
For backport PR, can you add [3.0] to the PR title? |
Test build #126203 has finished for PR 29159 at commit
|
Sorry, fixed. @Fokko |
@@ -64,7 +64,12 @@ trait DefaultReadWriteTest extends TempDirectory { self: Suite => | |||
case (Array(values), Array(newValues)) => | |||
assert(values === newValues, s"Values do not match on param ${p.name}.") | |||
case (value, newValue) => | |||
assert(value === newValue, s"Values do not match on param ${p.name}.") | |||
if (value.isInstanceOf[Double] && value.asInstanceOf[Double].isNaN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this check?
If necessary, may change to:
case (value: Double, newValue: Double) =>
assert(value.isNaN && newValue.isNaN || value == newValue, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before my change, this setDefault(strategy -> Imputer.mean, missingValue -> Double.NaN)
is in Imputer. Now I moved this to ImputerParams so ImputerModel also has the default value of missingValue
set to Double.NaN
. In DefaultReadWriteTest
, it compares the default value of missingValue
in the original model and the loaded model, so I need to make the NaN comparison work.
Test build #126304 has finished for PR 29159 at commit
|
retest this please |
Test build #126342 has finished for PR 29159 at commit
|
retest this please |
Test build #126346 has finished for PR 29159 at commit
|
retest this please |
Test build #126351 has finished for PR 29159 at commit
|
retest this please |
Test build #126353 has finished for PR 29159 at commit
|
retest this please |
Test build #126357 has finished for PR 29159 at commit
|
retest this please |
@@ -545,6 +551,10 @@ class _TrainValidationSplitParams(_ValidatorParams): | |||
trainRatio = Param(Params._dummy(), "trainRatio", "Param for ratio between train and\ | |||
validation data. Must be between 0 and 1.", typeConverter=TypeConverters.toFloat) | |||
|
|||
def __init__(self, *args): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added *args
many places. Just to make sure it is for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's for consistency.
Test build #126362 has finished for PR 29159 at commit
|
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall.
Test build #126365 has finished for PR 29159 at commit
|
retest this please |
Test build #126366 has finished for PR 29159 at commit
|
Jenkins, please make pip packaging tests pass! retest this please |
Test build #126367 has finished for PR 29159 at commit
|
@srowen Are you Okay with this? |
Same as the version for master - looks OK if there are no API or behavior changes (that are not bug fixes). |
### What changes were proposed in this pull request? backporting the changes to 3.0 set params default values in trait Params for feature and tuning in both Scala and Python. ### Why are the changes needed? Make ML has the same default param values between estimator and its corresponding transformer, and also between Scala and Python. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing and modified tests Closes #29159 from huaxingao/set_default_3.0. Authored-by: Huaxin Gao <huaxing@us.ibm.com> Signed-off-by: Huaxin Gao <huaxing@us.ibm.com>
Merged to 3.0. Thank you all for reviewing! |
What changes were proposed in this pull request?
backporting the changes to 3.0
set params default values in trait Params for feature and tuning in both Scala and Python.
Why are the changes needed?
Make ML has the same default param values between estimator and its corresponding transformer, and also between Scala and Python.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Existing and modified tests