-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-19133][SPARKR][ML] fix glm for Gamma, clarify glm family supported #16511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -52,6 +52,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj")) | |||
#' This can be a character string naming a family function, a family function or | |||
#' the result of a call to a family function. Refer R family at | |||
#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}. | |||
#' Currently these families are supported: \code{binomial}, \code{gaussian}, | |||
#' \code{Gamma}, and \code{poisson}. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in R, Gamma
family is capital G
Test build #71060 has finished for PR 16511 at commit
|
turns out Gamma is broken. will be working on adding tests |
Test build #71066 has finished for PR 16511 at commit
|
Test build #71070 has finished for PR 16511 at commit
|
jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper", | ||
"fit", formula, data@sdf, family$family, family$link, | ||
"fit", formula, data@sdf, tolower(family$family), family$link, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should put case conversion into MLlib side, then we can get consistent behavior from APIs across different languages (Scala/Python/R). I wish you would not mind that I have sent #16516 to fix it in MLlib. If it's reasonable, we can revert the change of this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a good idea - given how this is likely broken since Spark 1.6 I'd like to get the fix in ASAP and to master, branch-2.0, branch-2.1, or even perhaps branch-1.6 just in case.
I'd probably need to open different PRs for these since our file organization has changed, and would need to keep the R side change - unless you think ML side change could go to these branches too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need take some time to get MLlib side changed (and it involves changes for other estimators). Sounds good to keep the R side change.
x <- runif(100, -1, 1) | ||
y <- rgamma(100, rate = 10 / exp(0.5 + 1.2 * x), shape = 10) | ||
df <- as.DataFrame(as.data.frame(list(x = x, y = y))) | ||
model <- glm(y ~ x, family = Gamma, df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to use spark.glm
here, since the title of this test is: spark.glm and predict
(see L52). We have separate tests for R-compliant method glm
, but it's not necessary to duplicate all tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it matters much - as you can see in the code glm
is a single-line wrapper for spark.glm
- I actually thought it was better to add some tests for glm
instead of just testing spark.glm
LGTM |
…rted ## What changes were proposed in this pull request? R family is a longer list than what Spark supports. ## How was this patch tested? manual Author: Felix Cheung <felixcheung_m@hotmail.com> Closes apache#16511 from felixcheung/rdocglmfamily.
…rted ## What changes were proposed in this pull request? R family is a longer list than what Spark supports. ## How was this patch tested? manual Author: Felix Cheung <felixcheung_m@hotmail.com> Closes apache#16511 from felixcheung/rdocglmfamily.
What changes were proposed in this pull request?
R family is a longer list than what Spark supports.
How was this patch tested?
manual