Skip to content

[SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. #15488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

sethah
Copy link
Contributor

@sethah sethah commented Oct 14, 2016

What changes were proposed in this pull request?

The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete.

Also fixed some errors with the R code that was referenced in the test suit. Changed standardization=T to standardize=T since the former is invalid.

How was this patch tested?

Existing unit tests are modified. No non-test code is touched.

@sethah
Copy link
Contributor Author

sethah commented Oct 14, 2016

cc @yanboliang @dbtsai

coefficients = coef(glmnet(features,label, family="binomial", alpha = 0, lambda = 0))
coefficients
Use the following R code to load the data and train the model using glmnet package.
library("glmnet")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pasted every R code snippet into an R shell, so we can be reasonably certain of its correctness

label = as.factor(data$V1)
features = as.matrix(data.frame(data$V2, data$V3, data$V4, data$V5))
coefficientsStd = coef(glmnet(features, label, family="multinomial", alpha = 1,
lambda = 0.05, standardization=T))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

standardization is an invalid argument. I changed these to be correct standardize

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66966 has finished for PR 15488 at commit 54bb1b5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Oct 14, 2016

LGTM.

@asfgit asfgit closed this in de1c1ca Oct 14, 2016
@dbtsai
Copy link
Member

dbtsai commented Oct 14, 2016

Merged into master. Thanks.

robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…eights.

## What changes were proposed in this pull request?

The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete.

Also fixed some errors with the R code that was referenced in the test suit. Changed `standardization=T` to `standardize=T` since the former is invalid.

## How was this patch tested?

Existing unit tests are modified. No non-test code is touched.

Author: sethah <seth.hendrickson16@gmail.com>

Closes apache#15488 from sethah/logreg_weight_tests.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…eights.

## What changes were proposed in this pull request?

The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete.

Also fixed some errors with the R code that was referenced in the test suit. Changed `standardization=T` to `standardize=T` since the former is invalid.

## How was this patch tested?

Existing unit tests are modified. No non-test code is touched.

Author: sethah <seth.hendrickson16@gmail.com>

Closes apache#15488 from sethah/logreg_weight_tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants