Skip to content

[SPARK-9297][SQL] Add covar_pop and covar_samp #10029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Nov 29, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-9297

Add two aggregation functions: covar_pop and covar_samp.

@viirya viirya changed the title Add covar_pop and covar_samp. [SPARK-9297][SQL] Add covar_pop and covar_samp Nov 29, 2015
@SparkQA
Copy link

SparkQA commented Nov 29, 2015

Test build #46849 has finished for PR 10029 at commit 5a419f6.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Covariance(\n * case class CovSample(\n * case class CovPopulation(\n

@SparkQA
Copy link

SparkQA commented Nov 29, 2015

Test build #46851 has finished for PR 10029 at commit f0c1838.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Covariance(\n * case class CovSample(\n * case class CovPopulation(\n

@viirya
Copy link
Member Author

viirya commented Nov 29, 2015

retest this please.

@SparkQA
Copy link

SparkQA commented Nov 29, 2015

Test build #46853 has finished for PR 10029 at commit f0c1838.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * abstract class Covariance(\n * case class CovSample(\n * case class CovPopulation(\n

* Aggregate function: returns the population covariance for two columns.
*
* @group agg_funcs
* @since 1.6.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2.0.0

@SparkQA
Copy link

SparkQA commented Jan 11, 2016

Test build #49128 has finished for PR 10029 at commit 45d9f0c.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 11, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 11, 2016

Test build #49146 has finished for PR 10029 at commit 45d9f0c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 11, 2016

retest this please.

@SparkQA
Copy link

SparkQA commented Jan 11, 2016

Test build #49150 has finished for PR 10029 at commit 45d9f0c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 11, 2016

@davies please see if the updates are good for you. Thanks.

@viirya
Copy link
Member Author

viirya commented Jan 13, 2016

ping @davies

val count = buffer.getLong(countOffset)
if (count > 0) {
val Ck = buffer.getDouble(CkOffset)
val cov = Ck / count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Should we check Ck is nan or not? since count > 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check if cov is nan or not. If Ck is nan, cov will be nan too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant it make more sense to check Ck directly, because count > 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh. Got it.

@davies
Copy link
Contributor

davies commented Jan 13, 2016

@viirya One question is that: we are moving to do more codegen staff (whole stage codegen), but the ImperativeAggregate can't support that, not sure will we do another codegen version for this or not. Right now, ImperativeAggregate version faster, because the generated code is still not the best, but we can improve.

@viirya
Copy link
Member Author

viirya commented Jan 13, 2016

@davies Understood. Since the time to come out a better codegen version for this is not sure, can we merge this first in and update it with codegen version later if possibly?

@davies
Copy link
Contributor

davies commented Jan 13, 2016

@viirya Yeah, we can merge this in first.

@SparkQA
Copy link

SparkQA commented Jan 13, 2016

Test build #49300 has finished for PR 10029 at commit 832db06.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate

override def eval(buffer: InternalRow): Any = {
val count = buffer.getLong(countOffset)
if (count > 0) {
if (count > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the result is null, we should combine there two branches.

@SparkQA
Copy link

SparkQA commented Jan 13, 2016

Test build #49309 has finished for PR 10029 at commit 1806ced.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 13, 2016

Test build #49305 has finished for PR 10029 at commit 2f643d4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@davies
Copy link
Contributor

davies commented Jan 13, 2016

LGTM, merging this into master, thanks!

@asfgit asfgit closed this in 63eee86 Jan 13, 2016
@viirya viirya deleted the covar-funcs branch December 27, 2023 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants