-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-25959][ML] GBTClassifier picks wrong impurity stats on loading #22986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
9babef5
7d24f33
c2be400
176b0d8
6b91211
4aefc9f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -258,11 +258,7 @@ private[ml] object TreeClassifierParams { | |
private[ml] trait DecisionTreeClassifierParams | ||
extends DecisionTreeParams with TreeClassifierParams | ||
|
||
/** | ||
* Parameters for Decision Tree-based regression algorithms. | ||
*/ | ||
private[ml] trait TreeRegressorParams extends Params { | ||
|
||
private[ml] trait HasVarianceImpurity extends Params { | ||
/** | ||
* Criterion used for information gain calculation (case-insensitive). | ||
* Supported: "variance". | ||
|
@@ -271,9 +267,9 @@ private[ml] trait TreeRegressorParams extends Params { | |
*/ | ||
final val impurity: Param[String] = new Param[String](this, "impurity", "Criterion used for" + | ||
" information gain calculation (case-insensitive). Supported options:" + | ||
s" ${TreeRegressorParams.supportedImpurities.mkString(", ")}", | ||
s" ${HasVarianceImpurity.supportedImpurities.mkString(", ")}", | ||
(value: String) => | ||
TreeRegressorParams.supportedImpurities.contains(value.toLowerCase(Locale.ROOT))) | ||
HasVarianceImpurity.supportedImpurities.contains(value.toLowerCase(Locale.ROOT))) | ||
|
||
setDefault(impurity -> "variance") | ||
|
||
|
@@ -299,12 +295,17 @@ private[ml] trait TreeRegressorParams extends Params { | |
} | ||
} | ||
|
||
private[ml] object TreeRegressorParams { | ||
private[ml] object HasVarianceImpurity { | ||
// These options should be lowercase. | ||
final val supportedImpurities: Array[String] = | ||
Array("variance").map(_.toLowerCase(Locale.ROOT)) | ||
} | ||
|
||
/** | ||
* Parameters for Decision Tree-based regression algorithms. | ||
*/ | ||
private[ml] trait TreeRegressorParams extends HasVarianceImpurity | ||
|
||
private[ml] trait DecisionTreeRegressorParams extends DecisionTreeParams | ||
with TreeRegressorParams with HasVarianceCol { | ||
|
||
|
@@ -538,7 +539,7 @@ private[ml] object GBTClassifierParams { | |
Array("logistic").map(_.toLowerCase(Locale.ROOT)) | ||
} | ||
|
||
private[ml] trait GBTClassifierParams extends GBTParams with TreeClassifierParams { | ||
private[ml] trait GBTClassifierParams extends GBTParams with HasVarianceImpurity { | ||
|
||
/** | ||
* Loss function which GBT tries to minimize. (case-insensitive) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another issue is that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's right. Currently we just saying that we are using |
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can actually keep this trait and extend HasVarianceImpurity. It would be a no-op like a few other traits here. Although 'redundant', would it avoid any MiMa warnings or very slightly improve compatibility, even if this is private[spark]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually this trait is still there (just some lines below). The MiMa warnings are there because the returned type of
setImpurity
returnsthis.type
, which is nowHasVarianceImpurity
and the same for the setter...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at this one more time ... hm, isn't
this.type
always referring to the current class's type? maybe it's some detail of the generated code and this is essentially a false positive. Or, honestly setImpurity can just be removed in this change too. It's deprecated for all of these classes. I don't mind doing that later too; would just avoid these MiMa exclusions too.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it is AFAIK. I think it is a false positive.
I wouldn't remove it in this change because since this is a bug, this PR can be backported to 2.4.1 too. I'd do it in a separate change. We would need anyway to add the MiMa rules for the missing methods, so it doesn't change much on the exclusions either...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, but I don't know if we can back-port it because of the binary incompatibility, internal as it may be. I don't know. If it's not an issue then yes it can back port
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I am not sure how to verify that.