Commit f550e03
[SPARK-34794][SQL] Fix lambda variable name issues in nested DataFrame functions
### What changes were proposed in this pull request?
To fix lambda variable name issues in nested DataFrame functions, this PR modifies code to use a global counter for `LambdaVariables` names created by higher order functions.
This is the rework of apache#31887. Closes apache#31887.
### Why are the changes needed?
This moves away from the current hard-coded variable names which break on nested function calls. There is currently a bug where nested transforms in particular fail (the inner variable shadows the outer variable)
For this query:
```
val df = Seq(
(Seq(1,2,3), Seq("a", "b", "c"))
).toDF("numbers", "letters")
df.select(
f.flatten(
f.transform(
$"numbers",
(number: Column) => { f.transform(
$"letters",
(letter: Column) => { f.struct(
number.as("number"),
letter.as("letter")
) }
) }
)
).as("zipped")
).show(10, false)
```
This is the current (incorrect) output:
```
+------------------------------------------------------------------------+
|zipped |
+------------------------------------------------------------------------+
|[{a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}, {a, a}, {b, b}, {c, c}]|
+------------------------------------------------------------------------+
```
And this is the correct output after fix:
```
+------------------------------------------------------------------------+
|zipped |
+------------------------------------------------------------------------+
|[{1, a}, {1, b}, {1, c}, {2, a}, {2, b}, {2, c}, {3, a}, {3, b}, {3, c}]|
+------------------------------------------------------------------------+
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added the new test in `DataFrameFunctionsSuite`.
Closes apache#32424 from maropu/pr31887.
Lead-authored-by: dsolow <dsolow@sayari.com>
Co-authored-by: Takeshi Yamamuro <yamamuro@apache.org>
Co-authored-by: dmsolow <dsolow@sayarianalytics.com>
Signed-off-by: Takeshi Yamamuro <yamamuro@apache.org>1 parent 7fd3f8f commit f550e03
File tree
3 files changed
+40
-7
lines changed- sql
- catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions
- core/src
- main/scala/org/apache/spark/sql
- test/scala/org/apache/spark/sql
3 files changed
+40
-7
lines changedLines changed: 11 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
54 | 54 | | |
55 | 55 | | |
56 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
57 | 67 | | |
58 | 68 | | |
59 | 69 | | |
| |||
Lines changed: 6 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3800 | 3800 | | |
3801 | 3801 | | |
3802 | 3802 | | |
3803 | | - | |
| 3803 | + | |
3804 | 3804 | | |
3805 | 3805 | | |
3806 | 3806 | | |
3807 | 3807 | | |
3808 | 3808 | | |
3809 | | - | |
3810 | | - | |
| 3809 | + | |
| 3810 | + | |
3811 | 3811 | | |
3812 | 3812 | | |
3813 | 3813 | | |
3814 | 3814 | | |
3815 | 3815 | | |
3816 | | - | |
3817 | | - | |
3818 | | - | |
| 3816 | + | |
| 3817 | + | |
| 3818 | + | |
3819 | 3819 | | |
3820 | 3820 | | |
3821 | 3821 | | |
| |||
Lines changed: 23 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3629 | 3629 | | |
3630 | 3630 | | |
3631 | 3631 | | |
| 3632 | + | |
| 3633 | + | |
| 3634 | + | |
| 3635 | + | |
| 3636 | + | |
| 3637 | + | |
| 3638 | + | |
| 3639 | + | |
| 3640 | + | |
| 3641 | + | |
| 3642 | + | |
| 3643 | + | |
| 3644 | + | |
| 3645 | + | |
| 3646 | + | |
| 3647 | + | |
| 3648 | + | |
| 3649 | + | |
| 3650 | + | |
| 3651 | + | |
| 3652 | + | |
| 3653 | + | |
| 3654 | + | |
3632 | 3655 | | |
3633 | 3656 | | |
3634 | 3657 | | |
| |||
0 commit comments