Skip to content

Commit

Permalink
[Spark] Do not cast data type when not needed in DML commands (delta-…
Browse files Browse the repository at this point in the history
…io#4158)

#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

## Description

This PR changes the logic of data type casting in DML operations, so
that when the expression has the desired output type no cast will be
performed. This is to avoid a performance issue when the expression is
slow to evaluate, such as for array functions on an array of structs,
i.e., `array_*(array<struct<...>>, ...)`.

For example, given `array_distinct(physical_ids)` inside a WHEN UPDATE
clause:
```
CASE WHEN (size(physical_ids#270, false) = 0) THEN array_except(physical_ids#1611, []) ELSE [] END
```
It will be first transformed into 
```
array_distinct(transform(physical_ids, (_, idx) -> physical_ids[idx]))
```
and then to 
```
transform(
  when(size(to_add) != 0, array_distinct(physical_ids), to_remove),
  (_, idx) ->
    when(size(to_add) != 0, array_distinct(physical_ids), to_remove)[idx]
```
which triples the amount of calculations.

After this PR the transformation will not happen again.

## How was this patch tested?

Manually inspecting the query plan.

## Does this PR introduce _any_ user-facing changes?

No.
  • Loading branch information
xupefei authored Feb 14, 2025
1 parent 5aee635 commit 41a4e81
Showing 1 changed file with 4 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,9 @@ trait UpdateExpressionsSupport extends SQLConfHelper with AnalysisHelper with De
case Literal(nul, NullType) => Literal(nul, dataType)
case otherExpr =>
(fromExpression.dataType, dataType) match {
case (ArrayType(_: StructType, _), to @ ArrayType(toEt: StructType, toContainsNull)) =>
case (ArrayType(fromEt: StructType, fromNullable),
to @ ArrayType(toEt: StructType, toNullable))
if !(DataTypeUtils.sameType(fromEt, toEt) && fromNullable == toNullable) =>
fromExpression match {
// If fromExpression is an array function returning an array, cast the
// underlying array first and then perform the function on the transformed array.
Expand Down Expand Up @@ -158,7 +160,7 @@ trait UpdateExpressionsSupport extends SQLConfHelper with AnalysisHelper with De
castIfNeeded(
GetArrayItem(fromExpression, i), toEt, castingBehavior, columnName)
val transformLambdaFunc = {
val elementVar = NamedLambdaVariable("elementVar", toEt, toContainsNull)
val elementVar = NamedLambdaVariable("elementVar", toEt, toNullable)
val indexVar = NamedLambdaVariable("indexVar", IntegerType, false)
LambdaFunction(structConverter(elementVar, indexVar), Seq(elementVar, indexVar))
}
Expand Down

0 comments on commit 41a4e81

Please sign in to comment.