-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-9297][SQL] Add covar_pop and covar_samp #10029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Test build #46849 has finished for PR 10029 at commit
|
Test build #46851 has finished for PR 10029 at commit
|
retest this please. |
Test build #46853 has finished for PR 10029 at commit
|
* Aggregate function: returns the population covariance for two columns. | ||
* | ||
* @group agg_funcs | ||
* @since 1.6.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2.0.0
Test build #49128 has finished for PR 10029 at commit
|
retest this please. |
Test build #49146 has finished for PR 10029 at commit
|
retest this please. |
Test build #49150 has finished for PR 10029 at commit
|
@davies please see if the updates are good for you. Thanks. |
ping @davies |
val count = buffer.getLong(countOffset) | ||
if (count > 0) { | ||
val Ck = buffer.getDouble(CkOffset) | ||
val cov = Ck / count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Should we check Ck
is nan or not? since count > 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check if cov is nan or not. If Ck
is nan, cov will be nan too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant it make more sense to check Ck
directly, because count > 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. Got it.
@viirya One question is that: we are moving to do more codegen staff (whole stage codegen), but the ImperativeAggregate can't support that, not sure will we do another codegen version for this or not. Right now, ImperativeAggregate version faster, because the generated code is still not the best, but we can improve. |
@davies Understood. Since the time to come out a better codegen version for this is not sure, can we merge this first in and update it with codegen version later if possibly? |
@viirya Yeah, we can merge this in first. |
Test build #49300 has finished for PR 10029 at commit
|
override def eval(buffer: InternalRow): Any = { | ||
val count = buffer.getLong(countOffset) | ||
if (count > 0) { | ||
if (count > 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the result is null
, we should combine there two branches.
Test build #49309 has finished for PR 10029 at commit
|
Test build #49305 has finished for PR 10029 at commit
|
LGTM, merging this into master, thanks! |
JIRA: https://issues.apache.org/jira/browse/SPARK-9297
Add two aggregation functions: covar_pop and covar_samp.