-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-17941][ML][TEST] Logistic regression tests should use sample weights. #15488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
coefficients = coef(glmnet(features,label, family="binomial", alpha = 0, lambda = 0)) | ||
coefficients | ||
Use the following R code to load the data and train the model using glmnet package. | ||
library("glmnet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pasted every R code snippet into an R shell, so we can be reasonably certain of its correctness
label = as.factor(data$V1) | ||
features = as.matrix(data.frame(data$V2, data$V3, data$V4, data$V5)) | ||
coefficientsStd = coef(glmnet(features, label, family="multinomial", alpha = 1, | ||
lambda = 0.05, standardization=T)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
standardization
is an invalid argument. I changed these to be correct standardize
Test build #66966 has finished for PR 15488 at commit
|
LGTM. |
Merged into master. Thanks. |
…eights. ## What changes were proposed in this pull request? The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete. Also fixed some errors with the R code that was referenced in the test suit. Changed `standardization=T` to `standardize=T` since the former is invalid. ## How was this patch tested? Existing unit tests are modified. No non-test code is touched. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15488 from sethah/logreg_weight_tests.
…eights. ## What changes were proposed in this pull request? The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete. Also fixed some errors with the R code that was referenced in the test suit. Changed `standardization=T` to `standardize=T` since the former is invalid. ## How was this patch tested? Existing unit tests are modified. No non-test code is touched. Author: sethah <seth.hendrickson16@gmail.com> Closes apache#15488 from sethah/logreg_weight_tests.
What changes were proposed in this pull request?
The sample weight testing for logistic regressions is not robust. Logistic regression suite already has many test cases comparing results to R glmnet. Since both libraries support sample weights, we should use sample weights in the test to increase coverage for sample weighting. This patch doesn't really add any code and makes the testing more complete.
Also fixed some errors with the R code that was referenced in the test suit. Changed
standardization=T
tostandardize=T
since the former is invalid.How was this patch tested?
Existing unit tests are modified. No non-test code is touched.