Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees #4231

Closed
wants to merge 4 commits into from

Conversation

MechCoder
Copy link
Contributor

For unordered features, it is sufficient to use splits since the threshold of the split corresponds the threshold of the HighSplit of the bin and there is no use of the LowSplit.

@MechCoder
Copy link
Contributor Author

ping @jkbradley . Would be great if you could have a look.

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26183 has started for PR 4231 at commit 6dfaa28.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Jan 27, 2015

Test build #26183 has finished for PR 4231 at commit 6dfaa28.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26183/
Test PASSed.

@@ -96,14 +96,12 @@ private[tree] object TreePoint {
* Find bin for one (labeledPoint, feature).
*
* @param featureArity 0 for continuous features; number of categories for categorical features.
* @param isUnorderedFeature (only applies if feature is categorical)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkbradley I removed this param as it is unused. I don't think it is a problem since all tests pass.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Can you please also remove it from labeledPointToTreePoint and not compute it in convertToTreeRDD?

@MechCoder
Copy link
Contributor Author

It also returns empty bins, just to be compatible with the present API.

@MechCoder MechCoder changed the title [SPARK-3381] [MLlib] Eliminate bins for unordered features [SPARK-3381] [MLlib] Eliminate bins for unordered features in DecisionTrees Jan 28, 2015
@MechCoder
Copy link
Contributor Author

@jkbradley Can I please get a pass or comments on this? Or maybe others who are familiar with the tree code (@mengxr )

@MechCoder
Copy link
Contributor Author

Just out of curiosity, what happens when the deadline for code freeze passes?, i.e tomorrow.

@jkbradley
Copy link
Member

@MechCoder Sorry, there are a lot of PRs out there now, so this may not get merged before the code freeze. It's a good cleanup, though, so I'll definitely take a look when I can. Thanks for your patience.

@jkbradley
Copy link
Member

Oh, and the code freeze just means that we'll start work on cutting a release for 1.3 and no more PRs (unless really necessary) will get merged into the cut. PRs can still get merged into master (which will end up in 1.4).

@MechCoder
Copy link
Contributor Author

Thanks :) Looking forward to your reviews. I will work on other stuff till then.

@MechCoder
Copy link
Contributor Author

@jkbradley Sorry for being impatient, but would you be able to have a look anytime soon?

@MechCoder
Copy link
Contributor Author

ping @jkbradley ?

@jkbradley
Copy link
Member

I'll do my best to look at it today---I hope!

@@ -339,6 +340,7 @@ object DecisionTree extends Serializable with Logging {
agg: DTStatsAggregator,
treePoint: TreePoint,
bins: Array[Array[Bin]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to pass bins to this method anymore.

@jkbradley
Copy link
Member

Thanks for the PR. I’ve added a number of comments. Unfortunately, I won’t have an Internet connection for the next week, so I may not be able to respond to updates. Feel free to ping other people though, and I’ll check when I’m back!

@MechCoder
Copy link
Contributor Author

@jkbradley Thanks for your reviews. I fixed them up.

Anyone else want to have a final look?
cc @manishamde @mengxr ?

@SparkQA
Copy link

SparkQA commented Feb 7, 2015

Test build #26985 has started for PR 4231 at commit c7743c6.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 7, 2015

Test build #26986 has started for PR 4231 at commit aa30be9.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 7, 2015

Test build #26985 has finished for PR 4231 at commit c7743c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26985/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Feb 7, 2015

Test build #26986 has finished for PR 4231 at commit aa30be9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/26986/
Test PASSed.

// For unordered categorical features, there is no need to construct the bins.
// since there is a one-to-one correspondence between the splits and the bins.
bins(featureIndex) = new Array[Bin](0)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix indentation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to move the closing brace 2 spaces ahead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closing brace should be 2 spaces to the left: [https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see you got it.

@MechCoder
Copy link
Contributor Author

@jkbradley fixed!

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27605 has started for PR 4231 at commit b7c5581.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27605 has finished for PR 4231 at commit b7c5581.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27605/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27608 has started for PR 4231 at commit 58c19a5.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 17, 2015

Test build #27608 has finished for PR 4231 at commit 58c19a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27608/
Test PASSed.

@jkbradley
Copy link
Member

LGTM, thanks!

@manishamde
Copy link
Contributor

@MechCoder Sorry I didn't see the message earlier. I am sure @jkbradley must have done a thorough review but please let me know if you need me to take a look.

@MechCoder
Copy link
Contributor Author

@manishamde Thanks. The LGTM suggests that this should be good to go in! ;)

@asfgit asfgit closed this in 9b746f3 Feb 17, 2015
@jkbradley
Copy link
Member

Merged into master--thanks!

@manishamde
Copy link
Contributor

Thanks @MechCoder @jkbradley

@MechCoder MechCoder deleted the spark-3381 branch February 17, 2015 19:27
@MechCoder
Copy link
Contributor Author

Thanks ! Looking forward to learn lot's more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants