-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-39453][SQL][TESTS][FOLLOWUP] Let RAND
in filter is more meaningful.
#37033
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
beliefer
commented
Jun 30, 2022
@@ -856,11 +856,11 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel | |||
val df11 = sql( | |||
""" | |||
|SELECT * FROM h2.test.employee | |||
|WHERE GREATEST(bonus, 1100) > 1200 AND LEAST(salary, 10000) > 9000 AND RAND(1) < 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already test LEAST at
|WHERE IF(SALARY > 10000, SALARY, LEAST(SALARY, 1000)) > 1200 |
ping @huaxingao cc @cloud-fan |
RAND
in filter is more meaningful.
RAND
in filter is more meaningful.RAND
in filter is more meaningful.
ping @cloud-fan |
cloud-fan
approved these changes
Jul 5, 2022
thanks, merging to master! |
@cloud-fan Thank you ! |
chenzhx
pushed a commit
to chenzhx/spark
that referenced
this pull request
Jul 21, 2022
…ingful ### What changes were proposed in this pull request? apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. ### Why are the changes needed? Let `Rand` in filter is more meaningful. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test case. ### How was this patch tested? Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
chenzhx
added a commit
to Kyligence/spark
that referenced
this pull request
Jul 27, 2022
…LIMIT (#505) * [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF ### What changes were proposed in this pull request? Currently, Spark DS V2 push-down framework supports push down SQL to data sources. But the DS V2 push-down framework only support push down the built-in functions to data sources. Each database have a lot very useful functions which not supported by Spark. If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases. ### Why are the changes needed? 1. Spark can leverage the functions supported by databases 2. Improve the query performance. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36593 from beliefer/SPARK-39139. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful ### What changes were proposed in this pull request? apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. ### Why are the changes needed? Let `Rand` in filter is more meaningful. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test case. ### How was this patch tested? Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` ### What changes were proposed in this pull request? apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Why are the changes needed? Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Does this PR introduce _any_ user-facing change? 'Yes'. Bug will be fix. ### How was this patch tested? New test cases. Closes apache#37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39627][SQL] DS V2 pushdown should unify the compile API ### What changes were proposed in this pull request? Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them. ### Why are the changes needed? Improve ease of use. ### Does this PR introduce _any_ user-facing change? 'No'. The two API `compileAggregate` call `compileExpression` not changed. ### How was this patch tested? N/A Closes apache#37047 from beliefer/SPARK-39627. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect ### What changes were proposed in this pull request? Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions. Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect. ### Why are the changes needed? Make build-in JDBC dialect support compile linear regression aggregate push-down. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes apache#37188 from beliefer/SPARK-39384. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Sean Owen <srowen@gmail.com> * [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT ### What changes were proposed in this pull request? This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg. The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary. ### Why are the changes needed? support pushing down LIMIT/OFFSET after agg. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests Closes apache#37195 from cloud-fan/agg. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
yhcast0
pushed a commit
to yhcast0/spark
that referenced
this pull request
Aug 8, 2022
…LIMIT (Kyligence#505) * [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF ### What changes were proposed in this pull request? Currently, Spark DS V2 push-down framework supports push down SQL to data sources. But the DS V2 push-down framework only support push down the built-in functions to data sources. Each database have a lot very useful functions which not supported by Spark. If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases. ### Why are the changes needed? 1. Spark can leverage the functions supported by databases 2. Improve the query performance. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36593 from beliefer/SPARK-39139. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful ### What changes were proposed in this pull request? apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. ### Why are the changes needed? Let `Rand` in filter is more meaningful. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test case. ### How was this patch tested? Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` ### What changes were proposed in this pull request? apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Why are the changes needed? Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Does this PR introduce _any_ user-facing change? 'Yes'. Bug will be fix. ### How was this patch tested? New test cases. Closes apache#37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39627][SQL] DS V2 pushdown should unify the compile API ### What changes were proposed in this pull request? Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them. ### Why are the changes needed? Improve ease of use. ### Does this PR introduce _any_ user-facing change? 'No'. The two API `compileAggregate` call `compileExpression` not changed. ### How was this patch tested? N/A Closes apache#37047 from beliefer/SPARK-39627. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect ### What changes were proposed in this pull request? Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions. Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect. ### Why are the changes needed? Make build-in JDBC dialect support compile linear regression aggregate push-down. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes apache#37188 from beliefer/SPARK-39384. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Sean Owen <srowen@gmail.com> * [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT ### What changes were proposed in this pull request? This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg. The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary. ### Why are the changes needed? support pushing down LIMIT/OFFSET after agg. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests Closes apache#37195 from cloud-fan/agg. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
yhcast0
pushed a commit
to Kyligence/spark
that referenced
this pull request
Aug 8, 2022
…LIMIT (#505) * [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF ### What changes were proposed in this pull request? Currently, Spark DS V2 push-down framework supports push down SQL to data sources. But the DS V2 push-down framework only support push down the built-in functions to data sources. Each database have a lot very useful functions which not supported by Spark. If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases. ### Why are the changes needed? 1. Spark can leverage the functions supported by databases 2. Improve the query performance. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36593 from beliefer/SPARK-39139. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful ### What changes were proposed in this pull request? apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. ### Why are the changes needed? Let `Rand` in filter is more meaningful. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test case. ### How was this patch tested? Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` ### What changes were proposed in this pull request? apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Why are the changes needed? Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Does this PR introduce _any_ user-facing change? 'Yes'. Bug will be fix. ### How was this patch tested? New test cases. Closes apache#37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39627][SQL] DS V2 pushdown should unify the compile API ### What changes were proposed in this pull request? Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them. ### Why are the changes needed? Improve ease of use. ### Does this PR introduce _any_ user-facing change? 'No'. The two API `compileAggregate` call `compileExpression` not changed. ### How was this patch tested? N/A Closes apache#37047 from beliefer/SPARK-39627. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect ### What changes were proposed in this pull request? Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions. Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect. ### Why are the changes needed? Make build-in JDBC dialect support compile linear regression aggregate push-down. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes apache#37188 from beliefer/SPARK-39384. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Sean Owen <srowen@gmail.com> * [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT ### What changes were proposed in this pull request? This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg. The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary. ### Why are the changes needed? support pushing down LIMIT/OFFSET after agg. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests Closes apache#37195 from cloud-fan/agg. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
zheniantoushipashi
pushed a commit
to Kyligence/spark
that referenced
this pull request
Aug 8, 2022
…LIMIT (#505) * [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF ### What changes were proposed in this pull request? Currently, Spark DS V2 push-down framework supports push down SQL to data sources. But the DS V2 push-down framework only support push down the built-in functions to data sources. Each database have a lot very useful functions which not supported by Spark. If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases. ### Why are the changes needed? 1. Spark can leverage the functions supported by databases 2. Improve the query performance. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36593 from beliefer/SPARK-39139. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful ### What changes were proposed in this pull request? apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. ### Why are the changes needed? Let `Rand` in filter is more meaningful. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test case. ### How was this patch tested? Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` ### What changes were proposed in this pull request? apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Why are the changes needed? Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. ### Does this PR introduce _any_ user-facing change? 'Yes'. Bug will be fix. ### How was this patch tested? New test cases. Closes apache#37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39627][SQL] DS V2 pushdown should unify the compile API ### What changes were proposed in this pull request? Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them. ### Why are the changes needed? Improve ease of use. ### Does this PR introduce _any_ user-facing change? 'No'. The two API `compileAggregate` call `compileExpression` not changed. ### How was this patch tested? N/A Closes apache#37047 from beliefer/SPARK-39627. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect ### What changes were proposed in this pull request? Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions. Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect. ### Why are the changes needed? Make build-in JDBC dialect support compile linear regression aggregate push-down. ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New test cases. Closes apache#37188 from beliefer/SPARK-39384. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Sean Owen <srowen@gmail.com> * [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT ### What changes were proposed in this pull request? This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg. The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary. ### Why are the changes needed? support pushing down LIMIT/OFFSET after agg. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? updated tests Closes apache#37195 from cloud-fan/agg. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
leejaywei
pushed a commit
to Kyligence/spark
that referenced
this pull request
Aug 29, 2022
…LIMIT (#505) * [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF Currently, Spark DS V2 push-down framework supports push down SQL to data sources. But the DS V2 push-down framework only support push down the built-in functions to data sources. Each database have a lot very useful functions which not supported by Spark. If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases. 1. Spark can leverage the functions supported by databases 2. Improve the query performance. 'No'. New feature. New tests. Closes apache#36593 from beliefer/SPARK-39139. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. Let `Rand` in filter is more meaningful. 'No'. Just update test case. Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. 'Yes'. Bug will be fix. New test cases. Closes apache#37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39627][SQL] DS V2 pushdown should unify the compile API Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them. Improve ease of use. 'No'. The two API `compileAggregate` call `compileExpression` not changed. N/A Closes apache#37047 from beliefer/SPARK-39627. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions. Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect. Make build-in JDBC dialect support compile linear regression aggregate push-down. 'No'. New feature. New test cases. Closes apache#37188 from beliefer/SPARK-39384. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Sean Owen <srowen@gmail.com> * [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg. The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary. support pushing down LIMIT/OFFSET after agg. no updated tests Closes apache#37195 from cloud-fan/agg. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
leejaywei
pushed a commit
to Kyligence/spark
that referenced
this pull request
Aug 29, 2022
…LIMIT (#505) * [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF Currently, Spark DS V2 push-down framework supports push down SQL to data sources. But the DS V2 push-down framework only support push down the built-in functions to data sources. Each database have a lot very useful functions which not supported by Spark. If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases. 1. Spark can leverage the functions supported by databases 2. Improve the query performance. 'No'. New feature. New tests. Closes apache#36593 from beliefer/SPARK-39139. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. Let `Rand` in filter is more meaningful. 'No'. Just update test case. Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. 'Yes'. Bug will be fix. New test cases. Closes apache#37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39627][SQL] DS V2 pushdown should unify the compile API Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them. Improve ease of use. 'No'. The two API `compileAggregate` call `compileExpression` not changed. N/A Closes apache#37047 from beliefer/SPARK-39627. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions. Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect. Make build-in JDBC dialect support compile linear regression aggregate push-down. 'No'. New feature. New test cases. Closes apache#37188 from beliefer/SPARK-39384. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Sean Owen <srowen@gmail.com> * [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg. The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary. support pushing down LIMIT/OFFSET after agg. no updated tests Closes apache#37195 from cloud-fan/agg. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
RolatZhang
pushed a commit
to Kyligence/spark
that referenced
this pull request
Aug 29, 2023
…LIMIT (#505) * [SPARK-39139][SQL] DS V2 supports push down DS V2 UDF Currently, Spark DS V2 push-down framework supports push down SQL to data sources. But the DS V2 push-down framework only support push down the built-in functions to data sources. Each database have a lot very useful functions which not supported by Spark. If we can push down these functions into data source, it will reduce disk I/O and network I/O and improve the performance when query databases. 1. Spark can leverage the functions supported by databases 2. Improve the query performance. 'No'. New feature. New tests. Closes apache#36593 from beliefer/SPARK-39139. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39453][SQL][TESTS][FOLLOWUP] Let `RAND` in filter is more meaningful apache#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI). But he `Rand` in test case looks no meaningful. Let `Rand` in filter is more meaningful. 'No'. Just update test case. Just update test case. Closes apache#37033 from beliefer/SPARK-39453_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-37527][SQL][FOLLOWUP] Cannot compile COVAR_POP, COVAR_SAMP and CORR in `H2Dialect` if them with `DISTINCT` apache#35145 compile COVAR_POP, COVAR_SAMP and CORR in H2Dialect. Because H2 does't support COVAR_POP, COVAR_SAMP and CORR works with DISTINCT. So apache#35145 introduces a bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. Fix bug that compile COVAR_POP, COVAR_SAMP and CORR if these aggregate functions with DISTINCT. 'Yes'. Bug will be fix. New test cases. Closes apache#37090 from beliefer/SPARK-37527_followup2. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39627][SQL] DS V2 pushdown should unify the compile API Currently, `JdbcDialect` have two API `compileAggregate` and `compileExpression`, we can unify them. Improve ease of use. 'No'. The two API `compileAggregate` call `compileExpression` not changed. N/A Closes apache#37047 from beliefer/SPARK-39627. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39384][SQL] Compile built-in linear regression aggregate functions for JDBC dialect Recently, Spark DS V2 pushdown framework translate a lot of standard linear regression aggregate functions. Currently, only H2Dialect compile these standard linear regression aggregate functions. This PR compile these standard linear regression aggregate functions for other build-in JDBC dialect. Make build-in JDBC dialect support compile linear regression aggregate push-down. 'No'. New feature. New test cases. Closes apache#37188 from beliefer/SPARK-39384. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Sean Owen <srowen@gmail.com> * [SPARK-39148][SQL] DS V2 aggregate push down can work with OFFSET or LIMIT This PR refactors the v2 agg pushdown code. The main change is, now we don't build the `Scan` immediately when pushing agg. We did it so before because we want to know the data schema with agg pushed, then we can add cast when rewriting the query plan after pushdown. But the problem is, we build `Scan` too early and can't push down any more operators, while it's common to see LIMIT/OFFSET after agg. The idea of the refactor is, we don't need to know the data schema with agg pushed. We just give an expectation (the data type should be the same of Spark agg functions), use it to define the output of `ScanBuilderHolder`, and then rewrite the query plan. Later on, when we build the `Scan` and replace `ScanBuilderHolder` with `DataSourceV2ScanRelation`, we check the actual data schema and add a `Project` to do type cast if necessary. support pushing down LIMIT/OFFSET after agg. no updated tests Closes apache#37195 from cloud-fan/agg. Lead-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
#36830 makes DS V2 supports push down misc non-aggregate functions(non ANSI).
But he
Rand
in test case looks no meaningful.Why are the changes needed?
Let
Rand
in filter is more meaningful.Does this PR introduce any user-facing change?
'No'.
Just update test case.
How was this patch tested?
Just update test case.