Commit 6df4ec0
[SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
### What changes were proposed in this pull request?
To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions.
This is the rework of #31887. Closes #31887.
### Why are the changes needed?
This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable)
For this query:
```
val df = Seq(
(Seq(1,2,3), Seq("a", "b", "c"))
).toDF("numbers", "letters")
df.select(
f.flatten(
f.transform(
$"numbers",
(number: Column) => { f.transform(
$"letters",
(letter: Column) => { f.struct(
number.as("number"),
letter.as("letter")
) }
) }
)
).as("zipped")
).show(10, false)
```
This is the current (incorrect) output:
```
+------------------------------------------------------------------------+
|zipped |
+------------------------------------------------------------------------+
|[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]|
+------------------------------------------------------------------------+
```
And this is the correct output after fix:
```
+------------------------------------------------------------------------+
|zipped |
+------------------------------------------------------------------------+
|[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]|
+------------------------------------------------------------------------+
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added the new test in `DataFrameFunctionsSuite`.
Closes #32424 from maropu/pr31887.
Lead-authored-by: dsolow <dsolow@sayari.com>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Co-authored-by: dmsolow <dsolow@sayarianalytics.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>
(cherry picked from commit f550e03)
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>1 parent 89f5ec7 commit 6df4ec0
File tree
3 files changed
+40
-7
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions
- core/src
- main/scala/org/apache/spark/sql
- test/scala/org/apache/spark/sql
3 files changed
+40
-7
lines changedLines changed: 11 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
55 | 65 | | |
56 | 66 | | |
57 | 67 | | |
| |||
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3644 | 3644 | | |
3645 | 3645 | | |
3646 | 3646 | | |
3647 | | - | |
| 3647 | + | |
3648 | 3648 | | |
3649 | 3649 | | |
3650 | 3650 | | |
3651 | 3651 | | |
3652 | 3652 | | |
3653 | | - | |
3654 | | - | |
| 3653 | + | |
| 3654 | + | |
3655 | 3655 | | |
3656 | 3656 | | |
3657 | 3657 | | |
3658 | 3658 | | |
3659 | 3659 | | |
3660 | | - | |
3661 | | - | |
3662 | | - | |
| 3660 | + | |
| 3661 | + | |
| 3662 | + | |
3663 | 3663 | | |
3664 | 3664 | | |
3665 | 3665 | | |
| |||
Lines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3629 | 3629 | | |
3630 | 3630 | | |
3631 | 3631 | | |
| 3632 | + | |
| 3633 | + | |
| 3634 | + | |
| 3635 | + | |
| 3636 | + | |
| 3637 | + | |
| 3638 | + | |
| 3639 | + | |
| 3640 | + | |
| 3641 | + | |
| 3642 | + | |
| 3643 | + | |
| 3644 | + | |
| 3645 | + | |
| 3646 | + | |
| 3647 | + | |
| 3648 | + | |
| 3649 | + | |
| 3650 | + | |
| 3651 | + | |
| 3652 | + | |
| 3653 | + | |
| 3654 | + | |
3632 | 3655 | | |
3633 | 3656 | | |
3634 | 3657 | | |
| |||
0 commit comments