-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expression: support agg function var_pop #7718
Conversation
630b54c
to
403d894
Compare
Great work! Thanks @mccxj ! |
PTAL @lamxTyler . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will you implement it on tikv too?
https://arxiv.org/pdf/1509.04349.pdf |
@winoros thanks, I got it, the algorithm works fine. |
// This algorithm was proven to be numerically stable by J.L. Barlow in | ||
// "Error analysis of a pairwise summation algorithm to compute sample variance" | ||
// Numer. Math, 58 (1991) pp. 583--590 | ||
func CalculateMerge(partialCount int64, mergeCount int64, partialSum float64, mergeSum float64, partialVariance float64, mergeVariance float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why put it in types
package? Let it be a member function of baseVarianceFunction
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed +1
@mccxj any update? |
PR for parser pingcap/parser#53 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR contains two main changes:
- implement
var_pop
function in the TiDB layer, which located in "executor/aggfuncs/" - implement
var_pop
function in the mocktikv coprocessor layer, which located in "expression/aggregation/" - support the aggregate function
var_pop()
to be push down to tikv coprocessor.
Can you split this PR to three PRs:
- the first PR to implement the
var_pop
function in the TiDB layer - the second PR to
var_pop
function in the mocktikv coprocessor layer, which located in "expression/aggregation/" and add some unit tests. - the third PR to support the aggregate function
var_pop()
to be push down to tikv coprocessor.
Before the third PR, we should also add some integration tests to test tidb+tikv, after all the integration tests are passed, we can safely push down the var_pop
function to the tikv layer.
type partialResult4VarianceFloat64 struct { | ||
count int64 | ||
sum float64 | ||
variance float64 // $$\sum_{k=i}^{j}{(a_k-avg_{ij})^2}$$ (this is actually n times the variance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh.. How about use the normal form to express the math formula? For example:
sum((a[i]-avg)^2)
chk.AppendFloat64(e.ordinal, p.variance/float64(p.count)) | ||
default: | ||
// TODO support more population standard deviation/sample variance | ||
panic("Not implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can replace the panic
with an error
and return that error back.
@@ -36,7 +36,7 @@ type aggregationPushDownSolver struct { | |||
// Currently we don't support avg and concat. | |||
func (a *aggregationPushDownSolver) isDecomposable(fun *aggregation.AggFuncDesc) bool { | |||
switch fun.Name { | |||
case ast.AggFuncAvg, ast.AggFuncGroupConcat: | |||
case ast.AggFuncAvg, ast.AggFuncGroupConcat, ast.AggFuncVarPop: | |||
// TODO: Support avg push down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this TODO message needs to be modified.
// This algorithm was proven to be numerically stable by J.L. Barlow in | ||
// "Error analysis of a pairwise summation algorithm to compute sample variance" | ||
// Numer. Math, 58 (1991) pp. 583--590 | ||
func CalculateMerge(partialCount int64, mergeCount int64, partialSum float64, mergeSum float64, partialVariance float64, mergeVariance float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed +1
@mccxj friendly ping, any update? |
Hi contributor, thanks for your PR. This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically. |
@mccxj I'm going to close this PR due to outdated, feel free to reopen it if needed. |
What problem does this PR solve?
See #7623
PR for parser pingcap/parser#53
PR for tikv tikv/tikv#3835
What is changed and how it works?
Check List
Tests
Code changes
This change is