Implement a Chi-Squared test statistic option for measuring split quality #13438

erikerlandson · 2016-06-01T14:44:06Z

What changes were proposed in this pull request?

Using test statistics as a measure of decision tree split quality is a useful split halting measure that can yield improved model quality. I am proposing to add the chi-squared test statistic as a new impurity option (in addition to "gini" and "entropy") for classification decision trees and ensembles.

https://issues.apache.org/jira/browse/SPARK-15699

http://erikerlandson.github.io/blog/2016/05/26/measuring-decision-tree-split-quality-with-test-statistic-p-values/

How was this patch tested?

I added unit testing to verify that the chi-squared "impurity" measure functions as expected when used for decision tree training.

…lity when training decision trees

SparkQA · 2016-06-01T15:04:12Z

Test build #59737 has finished for PR 13438 at commit b7a47e0.

This patch fails MiMa tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-06-01T15:09:25Z

Please follow https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and update the title

erikerlandson · 2016-06-01T15:28:07Z

nuts, I'm going to have to re-submit a PR against master

erikerlandson · 2016-06-01T15:41:00Z

I'm closing this, re-submitted as #13440

Implement a Chi-Squared test statistic option for measuring split qua…

b7a47e0

…lity when training decision trees

erikerlandson mentioned this pull request Jun 1, 2016

[SPARK-15699] [ML] Implement a Chi-Squared test statistic option for measuring split quality #13440

Closed

erikerlandson closed this Jun 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement a Chi-Squared test statistic option for measuring split quality #13438

Implement a Chi-Squared test statistic option for measuring split quality #13438

Uh oh!

erikerlandson commented Jun 1, 2016

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

srowen commented Jun 1, 2016

Uh oh!

erikerlandson commented Jun 1, 2016

Uh oh!

erikerlandson commented Jun 1, 2016

Uh oh!

Uh oh!

Implement a Chi-Squared test statistic option for measuring split quality #13438

Implement a Chi-Squared test statistic option for measuring split quality #13438

Uh oh!

Conversation

erikerlandson commented Jun 1, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Jun 1, 2016

Uh oh!

srowen commented Jun 1, 2016

Uh oh!

erikerlandson commented Jun 1, 2016

Uh oh!

erikerlandson commented Jun 1, 2016

Uh oh!

Uh oh!