[SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. #15488

sethah · 2016-10-14T15:58:37Z

What changes were proposed in this pull request?

The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete.

Also fixed some errors with the R code that was referenced in the test suit. Changed standardization=T to standardize=T since the former is invalid.

How was this patch tested?

Existing unit tests are modified. No non-test code is touched.

sethah · 2016-10-14T15:59:31Z

cc @yanboliang @dbtsai

sethah · 2016-10-14T16:00:45Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

-       coefficients = coef(glmnet(features,label, family="binomial", alpha = 0, lambda = 0))
-       coefficients
+      Use the following R code to load the data and train the model using glmnet package.
+      library("glmnet")


I pasted every R code snippet into an R shell, so we can be reasonably certain of its correctness

sethah · 2016-10-14T16:02:06Z

mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala

-       label = as.factor(data$V1)
-       features = as.matrix(data.frame(data$V2, data$V3, data$V4, data$V5))
-       coefficientsStd = coef(glmnet(features, label, family="multinomial", alpha = 1,
-        lambda = 0.05, standardization=T))


standardization is an invalid argument. I changed these to be correct standardize

SparkQA · 2016-10-14T16:59:12Z

Test build #66966 has finished for PR 15488 at commit 54bb1b5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dbtsai · 2016-10-14T20:14:50Z

LGTM.

dbtsai · 2016-10-14T20:32:28Z

Merged into master. Thanks.

…eights. ## What changes were proposed in this pull request? The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete. Also fixed some errors with the R code that was referenced in the test suit. Changed `standardization=T` to `standardize=T` since the former is invalid. ## How was this patch tested? Existing unit tests are modified. No non-test code is touched. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15488 from sethah/logreg_weight_tests.

sethah added 5 commits October 13, 2016 15:51

binary is updated

2b7d997

all tests updated and passing

e523f41

strong l1 tests

e28ad43

comment formatting

b4a158f

style error

54bb1b5

sethah commented Oct 14, 2016

View reviewed changes

asfgit closed this in de1c1ca Oct 14, 2016

sethah mentioned this pull request Feb 3, 2017

[SPARK-18710][ML] Add offset in GLM #16699

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. #15488

[SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. #15488

Uh oh!

sethah commented Oct 14, 2016

Uh oh!

sethah commented Oct 14, 2016

Uh oh!

sethah Oct 14, 2016

Uh oh!

sethah Oct 14, 2016

Uh oh!

SparkQA commented Oct 14, 2016

Uh oh!

dbtsai commented Oct 14, 2016

Uh oh!

dbtsai commented Oct 14, 2016

Uh oh!

Uh oh!

[SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. #15488

[SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. #15488

Uh oh!

Conversation

sethah commented Oct 14, 2016

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

sethah commented Oct 14, 2016

Uh oh!

sethah Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

sethah Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 14, 2016

Uh oh!

dbtsai commented Oct 14, 2016

Uh oh!

dbtsai commented Oct 14, 2016

Uh oh!

Uh oh!