Skip to content

[SPARK-19133][SPARKR][ML] fix glm for Gamma, clarify glm family supported #16511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

felixcheung
Copy link
Member

What changes were proposed in this pull request?

R family is a longer list than what Spark supports.

How was this patch tested?

manual

@@ -52,6 +52,8 @@ setClass("IsotonicRegressionModel", representation(jobj = "jobj"))
#' This can be a character string naming a family function, a family function or
#' the result of a call to a family function. Refer R family at
#' \url{https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html}.
#' Currently these families are supported: \code{binomial}, \code{gaussian},
#' \code{Gamma}, and \code{poisson}.
Copy link
Member Author

@felixcheung felixcheung Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in R, Gamma family is capital G

@felixcheung
Copy link
Member Author

@yanboliang

@SparkQA
Copy link

SparkQA commented Jan 9, 2017

Test build #71060 has finished for PR 16511 at commit 3403567.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@felixcheung
Copy link
Member Author

turns out Gamma is broken. will be working on adding tests

@felixcheung felixcheung changed the title [SPARKR][DOCS] clarify glm family supported [SPARK-19133][SPARKR][DOCS] fix glm for Gamma, clarify glm family supported Jan 9, 2017
@felixcheung felixcheung changed the title [SPARK-19133][SPARKR][DOCS] fix glm for Gamma, clarify glm family supported [SPARK-19133][SPARKR] fix glm for Gamma, clarify glm family supported Jan 9, 2017
@felixcheung felixcheung changed the title [SPARK-19133][SPARKR] fix glm for Gamma, clarify glm family supported [SPARK-19133][SPARKR][ML] fix glm for Gamma, clarify glm family supported Jan 9, 2017
@SparkQA
Copy link

SparkQA commented Jan 9, 2017

Test build #71066 has finished for PR 16511 at commit 0688aa4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 9, 2017

Test build #71070 has finished for PR 16511 at commit 78d5390.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

jobj <- callJStatic("org.apache.spark.ml.r.GeneralizedLinearRegressionWrapper",
"fit", formula, data@sdf, family$family, family$link,
"fit", formula, data@sdf, tolower(family$family), family$link,
Copy link
Contributor

@yanboliang yanboliang Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should put case conversion into MLlib side, then we can get consistent behavior from APIs across different languages (Scala/Python/R). I wish you would not mind that I have sent #16516 to fix it in MLlib. If it's reasonable, we can revert the change of this line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a good idea - given how this is likely broken since Spark 1.6 I'd like to get the fix in ASAP and to master, branch-2.0, branch-2.1, or even perhaps branch-1.6 just in case.

I'd probably need to open different PRs for these since our file organization has changed, and would need to keep the R side change - unless you think ML side change could go to these branches too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need take some time to get MLlib side changed (and it involves changes for other estimators). Sounds good to keep the R side change.

x <- runif(100, -1, 1)
y <- rgamma(100, rate = 10 / exp(0.5 + 1.2 * x), shape = 10)
df <- as.DataFrame(as.data.frame(list(x = x, y = y)))
model <- glm(y ~ x, family = Gamma, df)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to use spark.glm here, since the title of this test is: spark.glm and predict(see L52). We have separate tests for R-compliant method glm, but it's not necessary to duplicate all tests.

Copy link
Member Author

@felixcheung felixcheung Jan 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it matters much - as you can see in the code glm is a single-line wrapper for spark.glm - I actually thought it was better to add some tests for glm instead of just testing spark.glm

@yanboliang
Copy link
Contributor

LGTM

@asfgit asfgit closed this in 9bc3507 Jan 10, 2017
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…rted

## What changes were proposed in this pull request?

R family is a longer list than what Spark supports.

## How was this patch tested?

manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes apache#16511 from felixcheung/rdocglmfamily.
cmonkey pushed a commit to cmonkey/spark that referenced this pull request Feb 15, 2017
…rted

## What changes were proposed in this pull request?

R family is a longer list than what Spark supports.

## How was this patch tested?

manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes apache#16511 from felixcheung/rdocglmfamily.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants