[SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions #36649

beliefer · 2022-05-24T08:03:39Z

What changes were proposed in this pull request?

The build-in functions in Spark is not the same as JDBC database.
We can provide the chance users could register dialect specific functions.

Why are the changes needed?

JDBC dialect supports registering dialect specific functions

Does this PR introduce any user-facing change?

'No'.
New feature.

How was this patch tested?

New tests.

… functions

beliefer · 2022-05-24T10:11:36Z

ping @huaxingao cc @cloud-fan

sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala

cloud-fan · 2022-05-24T15:43:05Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala

@@ -816,6 +818,15 @@ class JDBCSuite extends QueryTest
    }
  }

+  test("register dialect specific functions") {


This test does not reflect the expectation. The user story should be: if an end-user registers a JDBC catalog with a certain dialect, he/she can directly call functions like SELECT myCatalog.database.funcName(...) if the function is registered by the dialect.

The workflow should be

JDBC dialect report the functions it supports

Spark (JDBCTableCatalog) registers these function

end-users call these functions in their queries

cloud-fan · 2022-05-26T05:03:52Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

+  }
+
+  override def loadFunction(ident: Identifier): UnboundFunction = {
+    dialect.functions.toMap.get(ident) match {


we should create the map only once in JDBCTableCatalog, instead of every time we look up a function

let's also consider case sensitivity.

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

cloud-fan · 2022-05-26T08:52:29Z

sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCV2Suite.scala

@@ -1412,4 +1413,15 @@ class JDBCV2Suite extends QueryTest with SharedSparkSession with ExplainSuiteHel
      }
    }
  }
+
+  test("register dialect specific functions") {
+    H2Dialect.registerFunction("my_avg", IntegralAverage)


let's add a try-catch-finally to clear the registered functions

Seems it's hard to do clearFunctions for one test case only, how about we do the following:

in beforeAll, we register functions to H2Dialect

in afterAll, we clear functions in H2Dialect

The result is, the entire test suite will test against a JDBC catalog with a UDF. Other test suites will use a fresh SparkSession and instantiate new JDBCTabeCatalog instance and won't be affected.

cloud-fan · 2022-05-26T13:04:52Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

+    if (namespace.isEmpty) {
+      functions.keys.map(Identifier.of(namespace, _)).toArray
+    } else {
+      throw QueryCompilationErrors.noSuchNamespaceError(namespace)


We can return empty array here.

cloud-fan · 2022-05-26T13:06:28Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

+
+  override def loadFunction(ident: Identifier): UnboundFunction = {
+    if (ident.namespace().nonEmpty) {
+      throw QueryCompilationErrors.namespaceInJdbcUDFUnsupportedError(ident)


We can throw NoSuchFunctionException here

cloud-fan · 2022-05-26T13:13:58Z

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala

+  }
+
+  // test only
+  def clearFunctions(): Unit = {


it's weird that we put registerFunction in H2Dialect but put clearFunctions here.

cloud-fan · 2022-05-27T08:00:30Z

thanks, merging to master!

beliefer · 2022-05-27T08:02:30Z

@cloud-fan Thank you for review this PR.

… functions ### What changes were proposed in this pull request? The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. ### Why are the changes needed? JDBC dialect supports registering dialect specific functions ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions ### What changes were proposed in this pull request? The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. ### Why are the changes needed? JDBC dialect supports registering dialect specific functions ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite ### What changes were proposed in this pull request? `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. ### Why are the changes needed? Capitalize sql keywords in `JDBCV2Suite`. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test cases. ### How was this patch tested? N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too ### What changes were proposed in this pull request? apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. ### Why are the changes needed? This PR fix the bug for `PushableColumnWithoutNestedColumn`. ### Does this PR introduce _any_ user-facing change? 'Yes'. Let the push-down framework more correctly. ### How was this patch tested? New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic ### What changes were proposed in this pull request? The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. ### Why are the changes needed? Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. JDBC dialect supports registering dialect specific functions 'No'. New feature. New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. Capitalize sql keywords in `JDBCV2Suite`. 'No'. Just update test cases. N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. This PR fix the bug for `PushableColumnWithoutNestedColumn`. 'Yes'. Let the push-down framework more correctly. New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast yes, bug fix Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions ### What changes were proposed in this pull request? The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. ### Why are the changes needed? JDBC dialect supports registering dialect specific functions ### Does this PR introduce _any_ user-facing change? 'No'. New feature. ### How was this patch tested? New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite ### What changes were proposed in this pull request? `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. ### Why are the changes needed? Capitalize sql keywords in `JDBCV2Suite`. ### Does this PR introduce _any_ user-facing change? 'No'. Just update test cases. ### How was this patch tested? N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too ### What changes were proposed in this pull request? apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. ### Why are the changes needed? This PR fix the bug for `PushableColumnWithoutNestedColumn`. ### Does this PR introduce _any_ user-facing change? 'Yes'. Let the push-down framework more correctly. ### How was this patch tested? New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic ### What changes were proposed in this pull request? The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. ### Why are the changes needed? Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

…mal binary arithmetic (#481) * [SPARK-39270][SQL] JDBC dialect supports registering dialect specific functions The build-in functions in Spark is not the same as JDBC database. We can provide the chance users could register dialect specific functions. JDBC dialect supports registering dialect specific functions 'No'. New feature. New tests. Closes apache#36649 from beliefer/SPARK-39270. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39413][SQL] Capitalize sql keywords in JDBCV2Suite `JDBCV2Suite` exists some test case which uses sql keywords are not capitalized. This PR will capitalize sql keywords in `JDBCV2Suite`. Capitalize sql keywords in `JDBCV2Suite`. 'No'. Just update test cases. N/A. Closes apache#36805 from beliefer/SPARK-39413. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: huaxingao <huaxin_gao@apple.com> * [SPARK-38997][SPARK-39037][SQL][FOLLOWUP] PushableColumnWithoutNestedColumn` need be translated to predicate too apache#35768 assume the expression in `And`, `Or` and `Not` must be predicate. apache#36370 and apache#36325 supported push down expressions in `GROUP BY` and `ORDER BY`. But the children of `And`, `Or` and `Not` can be `FieldReference.column(name)`. `FieldReference.column(name)` is not a predicate, so the assert may fail. This PR fix the bug for `PushableColumnWithoutNestedColumn`. 'Yes'. Let the push-down framework more correctly. New tests Closes apache#36776 from beliefer/SPARK-38997_SPARK-39037_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * [SPARK-39316][SQL] Merge PromotePrecision and CheckOverflow into decimal binary arithmetic The main change: - Add a new method `resultDecimalType` in `BinaryArithmetic` - Add a new expression `DecimalAddNoOverflowCheck` for the internal decimal add, e.g. `Sum`/`Average`, the different with `Add` is: - `DecimalAddNoOverflowCheck` does not check overflow - `DecimalAddNoOverflowCheck` make `dataType` as its input parameter - Merge the decimal precision code of `DecimalPrecision` into each arithmetic data type, so every arithmetic should report the accurate decimal type. And we can remove the unused expression `PromotePrecision` and related code - Merge `CheckOverflow` iinto arithmetic eval and code-gen code path, so every arithmetic can handle the overflow case during runtime Merge `PromotePrecision` into `dataType`, for example, `Add`: ```scala override def resultDecimalType(p1: Int, s1: Int, p2: Int, s2: Int): DecimalType = { val resultScale = max(s1, s2) if (allowPrecisionLoss) { DecimalType.adjustPrecisionScale(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } else { DecimalType.bounded(max(p1 - s1, p2 - s2) + resultScale + 1, resultScale) } } ``` Merge `CheckOverflow`, for example, `Add` eval: ```scala dataType match { case decimalType: DecimalType => val value = numeric.plus(input1, input2) checkOverflow(value.asInstanceOf[Decimal], decimalType) ... } ``` Note that, `CheckOverflow` is still useful after this pr, e.g. `RowEncoder`. We can do further in a separate pr. Fix the bug of `TypeCoercion`, for example: ```sql SELECT CAST(1 AS DECIMAL(28, 2)) UNION ALL SELECT CAST(1 AS DECIMAL(18, 2)) / CAST(1 AS DECIMAL(18, 2)); ``` Relax the decimal precision at runtime, so we do not need redundant Cast yes, bug fix Pass exists test and add some bug fix test in `decimalArithmeticOperations.sql` Closes apache#36698 from ulysses-you/decimal. Lead-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> * fix ut Co-authored-by: Jiaan Geng <beliefer@163.com> Co-authored-by: ulysses-you <ulyssesyou18@gmail.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Co-authored-by: Wenchen Fan <wenchen@databricks.com>

[SPARK-39270][SQL] JDBC dialect supports registering dialect specific…

e8deb4c

… functions

github-actions bot added the SQL label May 24, 2022

cloud-fan reviewed May 24, 2022

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/jdbc/JdbcDialects.scala Show resolved Hide resolved

cloud-fan reviewed May 24, 2022

View reviewed changes

beliefer added 4 commits May 25, 2022 12:31

Update code

faece4d

Update code

0b8536b

Update code

acc2a16

Update code

7ae6de5