-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[MLlib] SPARK-1536: multiclass classification support for decision tree #886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
80 commits
Select commit
Hold shift + click to select a range
50b143a
adding support for very deep trees
manishamde abc5a23
Parameterizing max memory.
etrain 2f6072c
Merge pull request #5 from etrain/deep_tree
manishamde 2f1e093
minor: added doc for maxMemory parameter
manishamde 0287772
Fixing scalastyle issue.
etrain fecf89a
Merge pull request #6 from etrain/deep_tree
manishamde 719d009
updating user documentation
manishamde 9dbdabe
merge from master
manishamde 1517155
updated documentation
manishamde 718506b
added unit test
manishamde e0426ee
renamed parameter
manishamde dad9652
removed unused imports
manishamde cbd9f14
modified scala.math to math
manishamde 5e82202
added documentation, fixed off by 1 error in max level calculation
manishamde 4731cda
formatting
manishamde 5eca9e4
grammar
manishamde 8053fed
more formatting
manishamde 426bb28
programming guide blurb
manishamde b27ad2c
formatting
manishamde ce004a1
minor formatting
manishamde 7fc9545
added docs
manishamde 968ca9d
merged master
manishamde a1a6e09
added weighted point class
manishamde 14aea48
changing instance format to weighted labeled point
manishamde 455bea9
fixed tests
manishamde 46f909c
todo for multiclass support
manishamde 4d5f70c
added multiclass support for find splits bins
manishamde 3f85a17
tests for multiclass classification
manishamde 46e06ee
minor mods
manishamde 6c7af22
prepared for multiclass without breaking binary classification
manishamde 5c78e1a
added multiclass support
manishamde e006f9d
changing variable names
manishamde 098e8c5
merged master
manishamde 34549d0
fixing error during merge
manishamde e547151
minor modifications
manishamde 75f2bfc
minor code style fix
manishamde 6b912dc
added numclasses to tree runner, predict logic for multiclass, add mu…
manishamde 18d2835
changing default values for num classes
manishamde d012be7
fixed while loop
manishamde ed5a2df
fixed classification requirements
manishamde d8e4a11
sample weights
manishamde ab5cb21
multiclass logic
manishamde d811425
multiclass bin aggregate logic
manishamde f16a9bb
fixing while loop
manishamde 1dd2735
bin search logic for multiclass
manishamde 7e5f08c
minor doc
manishamde bce835f
code cleanup
manishamde 828ff16
added categorical variable test
manishamde 8cfd3b6
working for categorical multiclass classification
manishamde f5f6b83
multiclass for continous variables
manishamde 1892a2c
tests and use multiclass binaggregate length when atleast one categor…
manishamde 9a90c93
Merge branch 'master' into multiclass
manishamde 12e6d0a
minor: removing line in doc
manishamde 237762d
renaming functions
manishamde 34ee7b9
minor: code style
manishamde 23d4268
minor: another minor code style
manishamde e3e8843
minor code formatting
manishamde adc7315
support ordered categorical splits for multiclass classification
manishamde 8e44ab8
updated doc
manishamde 3d7f911
updated doc
manishamde 485eaae
implicit conversion from LabeledPoint to WeightedLabeledPoint
manishamde 5c1b2ca
doc for PointConverter class
manishamde 9cc3e31
added implicit conversion import
manishamde 06b1690
fixed off-by-one error in bin to split conversion
manishamde 2061cf5
merged from master
manishamde 0fecd38
minor: add newline to EOF
manishamde d75ac32
removed WeightedLabeledPoint from this PR
manishamde e4c1321
using while loop for regression histograms
manishamde b2ae41f
minor: scalastyle
manishamde 4e85f2c
minor: fixed scalastyle issues
manishamde 2d85a48
minor: fixed scalastyle issues reprise
manishamde afced16
removed label weights support
manishamde c8428c4
fixing weird multiline bug
manishamde 45e767a
adding developer api annotation for overriden methods
manishamde abf2901
adding classes to MimaExcludes.scala
manishamde e1c970d
merged master
manishamde 10fdd82
fixing MIMA excludes
manishamde 1ce7212
change problem filter for mima
manishamde c5b2d04
more MIMA fixes
manishamde 26f8acc
another attempt at fixing mima
manishamde File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
732 changes: 532 additions & 200 deletions
732
mllib/src/main/scala/org/apache/spark/mllib/tree/DecisionTree.scala
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this to be a parameter and not inferred from the data?
Also - I'm wondering if it makes sense to subclass params with DecisionTreeParams vs. RegressionTreeParams so that we keep logically separate options separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inference from a large dataset could take a lot of time. In general, most practitioners know in advance. If not, we can add a pre-processing step.
Currently we have only
numClassesForClassification
as a classification specific parameter. In general, I agree with you. At the same time, didn't want to create more configuration classes for the user. Shall we leave it as is for now and handle it with the ensembles PR where we have more parameters (boosting iterations, num trees, feature subsetting, etc.) ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. If it doesn't complicate things too much we might
consider adding an interface that doesn't have this specified and figures
it out in one shot.
Worth noting is that in R, an object of type "factor" (the default for
categorical/label data) has this information built in. It can be a big pain
at load time while the system tries to figure out the cardinality of the
factor, but it leads to a nice compact representation of the data and
eliminates situations like this one.
I agree on doing the API separation with the ensembles PR.
On Thu, Jun 19, 2014 at 10:46 AM, manishamde notifications@github.com
wrote:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Let me create a JIRA ticket for this so that we don't forget. :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://issues.apache.org/jira/browse/SPARK-2206