-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees #4231
Conversation
ping @jkbradley . Would be great if you could have a look. |
Test build #26183 has started for PR 4231 at commit
|
Test build #26183 has finished for PR 4231 at commit
|
Test PASSed. |
@@ -96,14 +96,12 @@ private[tree] object TreePoint { | |||
* Find bin for one (labeledPoint, feature). | |||
* | |||
* @param featureArity 0 for continuous features; number of categories for categorical features. | |||
* @param isUnorderedFeature (only applies if feature is categorical) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkbradley I removed this param as it is unused. I don't think it is a problem since all tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Can you please also remove it from labeledPointToTreePoint and not compute it in convertToTreeRDD?
It also returns empty bins, just to be compatible with the present API. |
@jkbradley Can I please get a pass or comments on this? Or maybe others who are familiar with the tree code (@mengxr ) |
Just out of curiosity, what happens when the deadline for code freeze passes?, i.e tomorrow. |
@MechCoder Sorry, there are a lot of PRs out there now, so this may not get merged before the code freeze. It's a good cleanup, though, so I'll definitely take a look when I can. Thanks for your patience. |
Oh, and the code freeze just means that we'll start work on cutting a release for 1.3 and no more PRs (unless really necessary) will get merged into the cut. PRs can still get merged into master (which will end up in 1.4). |
Thanks :) Looking forward to your reviews. I will work on other stuff till then. |
@jkbradley Sorry for being impatient, but would you be able to have a look anytime soon? |
ping @jkbradley ? |
I'll do my best to look at it today---I hope! |
@@ -339,6 +340,7 @@ object DecisionTree extends Serializable with Logging { | |||
agg: DTStatsAggregator, | |||
treePoint: TreePoint, | |||
bins: Array[Array[Bin]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to pass bins to this method anymore.
Thanks for the PR. I’ve added a number of comments. Unfortunately, I won’t have an Internet connection for the next week, so I may not be able to respond to updates. Feel free to ping other people though, and I’ll check when I’m back! |
@jkbradley Thanks for your reviews. I fixed them up. Anyone else want to have a final look? |
Test build #26985 has started for PR 4231 at commit
|
Test build #26986 has started for PR 4231 at commit
|
Test build #26985 has finished for PR 4231 at commit
|
Test PASSed. |
Test build #26986 has finished for PR 4231 at commit
|
Test PASSed. |
// For unordered categorical features, there is no need to construct the bins. | ||
// since there is a one-to-one correspondence between the splits and the bins. | ||
bins(featureIndex) = new Array[Bin](0) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix indentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean to move the closing brace 2 spaces ahead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The closing brace should be 2 spaces to the left: [https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see you got it.
@jkbradley fixed! |
Test build #27605 has started for PR 4231 at commit
|
Test build #27605 has finished for PR 4231 at commit
|
Test FAILed. |
b7c5581
to
58c19a5
Compare
Test build #27608 has started for PR 4231 at commit
|
Test build #27608 has finished for PR 4231 at commit
|
Test PASSed. |
LGTM, thanks! |
@MechCoder Sorry I didn't see the message earlier. I am sure @jkbradley must have done a thorough review but please let me know if you need me to take a look. |
@manishamde Thanks. The LGTM suggests that this should be good to go in! ;) |
Merged into master--thanks! |
Thanks @MechCoder @jkbradley |
Thanks ! Looking forward to learn lot's more |
For unordered features, it is sufficient to use splits since the threshold of the split corresponds the threshold of the HighSplit of the bin and there is no use of the LowSplit.