Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expression: support agg function var_pop #7718

Closed
wants to merge 6 commits into from
Closed

Conversation

mccxj
Copy link
Contributor

@mccxj mccxj commented Sep 16, 2018

What problem does this PR solve?

See #7623

PR for parser pingcap/parser#53

PR for tikv tikv/tikv#3835

What is changed and how it works?

mysql> create table t(a int(11), b float, c double, d decimal(6,2));
Query OK, 0 rows affected (0.01 sec)

mysql> select var_pop(a), var_pop(b), var_pop(c), var_pop(d) from t;
+------------+------------+------------+------------+
| var_pop(a) | var_pop(b) | var_pop(c) | var_pop(d) |
+------------+------------+------------+------------+
|       NULL |       NULL |       NULL |       NULL |
+------------+------------+------------+------------+
1 row in set (0.00 sec)

mysql> insert into t values(1, 2.5, 3.5, 6.25), (2, 3.5, 4.5, 8.55);
Query OK, 2 rows affected (0.00 sec)

mysql> select var_pop(a), var_pop(b), var_pop(c), var_pop(d) from t;
+------------+------------+------------+------------+
| var_pop(a) | var_pop(b) | var_pop(c) | var_pop(d) |
+------------+------------+------------+------------+
|       0.25 |       0.25 |       0.25 |       1.32 |
+------------+------------+------------+------------+
1 row in set (0.01 sec)

mysql> select var_pop(a), var_pop(b), variance(a), variance(b) from t;
+------------+------------+-------------+-------------+
| var_pop(a) | var_pop(b) | variance(a) | variance(b) |
+------------+------------+-------------+-------------+
|       0.25 |       0.25 |        0.25 |        0.25 |
+------------+------------+-------------+-------------+
1 row in set (0.01 sec)

mysql> select var_pop(a), var_pop(b), var_pop(c), var_pop(d) from t group by a;
+------------+------------+------------+------------+
| var_pop(a) | var_pop(b) | var_pop(c) | var_pop(d) |
+------------+------------+------------+------------+
|          0 |          0 |          0 |       0.00 |
|          0 |          0 |          0 |       0.00 |
+------------+------------+------------+------------+
2 rows in set (0.00 sec)

mysql> insert into t values(1, 3.5, 4.5, 5.25);
Query OK, 1 row affected (0.00 sec)

-- run with --column-type-info
mysql> select a, var_pop(a), var_pop(b), var_pop(c), var_pop(d) from t group by a;
Field   1:  `a`
Catalog:    `def`
Database:   `test`
Table:      `t`
Org_table:  `t`
Type:       LONG
Collation:  binary (63)
Length:     11
Max_length: 1
Decimals:   0
Flags:      NUM 

Field   2:  `var_pop(a)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       DOUBLE
Collation:  binary (63)
Length:     23
Max_length: 1
Decimals:   0
Flags:      BINARY NUM 

Field   3:  `var_pop(b)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       DOUBLE
Collation:  binary (63)
Length:     23
Max_length: 4
Decimals:   31
Flags:      BINARY NUM 

Field   4:  `var_pop(c)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       DOUBLE
Collation:  binary (63)
Length:     23
Max_length: 4
Decimals:   31
Flags:      BINARY NUM 

Field   5:  `var_pop(d)`
Catalog:    `def`
Database:   ``
Table:      ``
Org_table:  ``
Type:       DOUBLE
Collation:  binary (63)
Length:     23
Max_length: 4
Decimals:   2
Flags:      BINARY NUM 


+------+------------+------------+------------+------------+
| a    | var_pop(a) | var_pop(b) | var_pop(c) | var_pop(d) |
+------+------------+------------+------------+------------+
|    2 |          0 |          0 |          0 |       0.00 |
|    1 |          0 |       0.25 |       0.25 |       0.25 |
+------+------------+------------+------------+------------+
2 rows in set (0.00 sec)

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change

This change is Reviewable

@mccxj mccxj force-pushed the var_pop branch 2 times, most recently from 630b54c to 403d894 Compare September 17, 2018 14:25
@mccxj mccxj changed the title expression: support agg function var_pop(WIP) expression: support agg function var_pop Sep 17, 2018
@mccxj
Copy link
Contributor Author

mccxj commented Sep 18, 2018

@zz-jason @jackysp @imtbkcat PTAL

@jackysp
Copy link
Member

jackysp commented Sep 20, 2018

Great work! Thanks @mccxj !

@jackysp
Copy link
Member

jackysp commented Sep 20, 2018

PTAL @lamxTyler .

Copy link
Member

@winoros winoros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you implement it on tikv too?

executor/aggfuncs/builder.go Show resolved Hide resolved
executor/aggfuncs/func_var_pop.go Outdated Show resolved Hide resolved
expression/aggregation/aggregation.go Outdated Show resolved Hide resolved
@zz-jason zz-jason added type/enhancement The issue or PR belongs to an enhancement. contribution This PR is from a community contributor. sig/execution SIG execution labels Sep 20, 2018
@winoros
Copy link
Member

winoros commented Sep 20, 2018

https://arxiv.org/pdf/1509.04349.pdf
related paper

@mccxj
Copy link
Contributor Author

mccxj commented Sep 20, 2018

@winoros thanks, I got it, the algorithm works fine.
img_20180920_182012

@mccxj
Copy link
Contributor Author

mccxj commented Sep 22, 2018

@jackysp @zz-jason @winoros @lamxTyler PTAL

executor/aggfuncs/func_var_pop.go Outdated Show resolved Hide resolved
executor/aggfuncs/func_var_pop.go Outdated Show resolved Hide resolved
executor/aggfuncs/func_var_pop.go Outdated Show resolved Hide resolved
executor/aggfuncs/func_var_pop.go Outdated Show resolved Hide resolved
executor/aggfuncs/func_var_pop.go Outdated Show resolved Hide resolved
executor/aggfuncs/func_var_pop.go Outdated Show resolved Hide resolved
parser/parser.y Outdated Show resolved Hide resolved
parser/parser.y Outdated Show resolved Hide resolved
parser/parser_test.go Outdated Show resolved Hide resolved
expression/aggregation/variance.go Show resolved Hide resolved
expression/aggregation/descriptor.go Show resolved Hide resolved
expression/aggregation/aggregation.go Show resolved Hide resolved
// This algorithm was proven to be numerically stable by J.L. Barlow in
// "Error analysis of a pairwise summation algorithm to compute sample variance"
// Numer. Math, 58 (1991) pp. 583--590
func CalculateMerge(partialCount int64, mergeCount int64, partialSum float64, mergeSum float64, partialVariance float64, mergeVariance float64) float64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why put it in types package? Let it be a member function of baseVarianceFunction is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed +1

types/helper.go Outdated Show resolved Hide resolved
plan/rule_aggregation_push_down.go Outdated Show resolved Hide resolved
executor/aggfuncs/func_variance.go Outdated Show resolved Hide resolved
@zz-jason
Copy link
Member

zz-jason commented Oct 9, 2018

@mccxj any update?

@mccxj
Copy link
Contributor Author

mccxj commented Nov 27, 2018

PR for parser pingcap/parser#53

@mccxj
Copy link
Contributor Author

mccxj commented Nov 27, 2018

PTAL @jackysp @zz-jason @winoros @XuHuaiyu @lamxTyler

Copy link
Member

@zz-jason zz-jason left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR contains two main changes:

  1. implement var_pop function in the TiDB layer, which located in "executor/aggfuncs/"
  2. implement var_pop function in the mocktikv coprocessor layer, which located in "expression/aggregation/"
  3. support the aggregate function var_pop() to be push down to tikv coprocessor.

Can you split this PR to three PRs:

  1. the first PR to implement the var_pop function in the TiDB layer
  2. the second PR to var_pop function in the mocktikv coprocessor layer, which located in "expression/aggregation/" and add some unit tests.
  3. the third PR to support the aggregate function var_pop() to be push down to tikv coprocessor.

Before the third PR, we should also add some integration tests to test tidb+tikv, after all the integration tests are passed, we can safely push down the var_pop function to the tikv layer.

type partialResult4VarianceFloat64 struct {
count int64
sum float64
variance float64 // $$\sum_{k=i}^{j}{(a_k-avg_{ij})^2}$$ (this is actually n times the variance)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh.. How about use the normal form to express the math formula? For example:

sum((a[i]-avg)^2)

chk.AppendFloat64(e.ordinal, p.variance/float64(p.count))
default:
// TODO support more population standard deviation/sample variance
panic("Not implemented")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can replace the panic with an error and return that error back.

@@ -36,7 +36,7 @@ type aggregationPushDownSolver struct {
// Currently we don't support avg and concat.
func (a *aggregationPushDownSolver) isDecomposable(fun *aggregation.AggFuncDesc) bool {
switch fun.Name {
case ast.AggFuncAvg, ast.AggFuncGroupConcat:
case ast.AggFuncAvg, ast.AggFuncGroupConcat, ast.AggFuncVarPop:
// TODO: Support avg push down.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this TODO message needs to be modified.

// This algorithm was proven to be numerically stable by J.L. Barlow in
// "Error analysis of a pairwise summation algorithm to compute sample variance"
// Numer. Math, 58 (1991) pp. 583--590
func CalculateMerge(partialCount int64, mergeCount int64, partialSum float64, mergeSum float64, partialVariance float64, mergeVariance float64) float64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed +1

@zz-jason
Copy link
Member

zz-jason commented Aug 3, 2019

@mccxj friendly ping, any update?

@sre-bot
Copy link
Contributor

sre-bot commented Aug 3, 2019

Hi contributor, thanks for your PR.

This patch needs to be approved by someone of admins. They should reply with "/ok-to-test" to accept this PR for running test automatically.

@zz-jason
Copy link
Member

@mccxj I'm going to close this PR due to outdated, feel free to reopen it if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/expression contribution This PR is from a community contributor. sig/execution SIG execution type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants