Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-49366][CONNECT] Treat Union node as leaf in dataframe column r…
…esolution ### What changes were proposed in this pull request? Treat Union node as leaf in column resolution ### Why are the changes needed? bug fix: ``` from pyspark.sql.functions import concat, lit, col df1 = spark.range(10).withColumn("value", lit(1)) df2 = df1.union(df1) df1.join(df2, df1.id == df2.id, "left").show() ``` fails with `AMBIGUOUS_COLUMN_REFERENCE` ``` resolveExpressionByPlanChildren: e = '`==`('id, 'id) resolveExpressionByPlanChildren: q = '[id=63]Join LeftOuter, '`==`('id, 'id) :- [id=61]Project [id#550L, 1 AS value#553] : +- Range (0, 10, step=1, splits=Some(12)) +- [id=62]Union false, false :- [id=61]Project [id#564L, 1 AS value#565] : +- Range (0, 10, step=1, splits=Some(12)) +- [id=61]Project [id#566L, 1 AS value#567] +- Range (0, 10, step=1, splits=Some(12)) 'id with id = 61 [id=61]Project [id#564L, 1 AS value#565] +- Range (0, 10, step=1, splits=Some(12)) [id=61]Project [id#566L, 1 AS value#567] +- Range (0, 10, step=1, splits=Some(12)) resolved: Vector((Some((id#564L,1)),true), (Some((id#566L,1)),true)) ``` When resolving `'id with id = 61`, existing detection fails in the second child. ### Does this PR introduce _any_ user-facing change? yes, bug fix ### How was this patch tested? added tests ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47853 from zhengruifeng/fix_ambgious_union. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Ruifeng Zheng <ruifengz@apache.org>
- Loading branch information