Skip to content

Commit d9477dd

Browse files
karenfengcloud-fan
authored andcommitted
[SPARK-39376][SQL] Hide duplicated columns in star expansion of subquery alias from NATURAL/USING JOIN
### What changes were proposed in this pull request? Follows up from #31666. This PR introduced a bug where the qualified star expansion of a subquery alias containing a NATURAL/USING output duplicated columns. ### Why are the changes needed? Duplicated, hidden columns should not be output from a star expansion. ### Does this PR introduce _any_ user-facing change? The query ``` val df1 = Seq((3, 8)).toDF("a", "b") val df2 = Seq((8, 7)).toDF("b", "d") val joinDF = df1.join(df2, "b") joinDF.alias("r").select("r.*") ``` Now outputs a single column `b`, instead of two (duplicate) columns for `b`. ### How was this patch tested? UTs Closes #36763 from karenfeng/SPARK-39376. Authored-by: Karen Feng <karen.feng@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
1 parent 4a529a0 commit d9477dd

File tree

2 files changed

+24
-1
lines changed

2 files changed

+24
-1
lines changed

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1303,7 +1303,8 @@ case class SubqueryAlias(
13031303

13041304
override def metadataOutput: Seq[Attribute] = {
13051305
val qualifierList = identifier.qualifier :+ alias
1306-
child.metadataOutput.map(_.withQualifier(qualifierList))
1306+
val nonHiddenMetadataOutput = child.metadataOutput.filter(!_.supportsQualifiedStar)
1307+
nonHiddenMetadataOutput.map(_.withQualifier(qualifierList))
13071308
}
13081309

13091310
override def maxRows: Option[Long] = child.maxRows

sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -499,4 +499,26 @@ class DataFrameJoinSuite extends QueryTest
499499
)
500500
}
501501
}
502+
503+
test("SPARK-39376: Hide duplicated columns in star expansion of subquery alias from USING JOIN") {
504+
val joinDf = testData2.as("testData2").join(
505+
testData3.as("testData3"), usingColumns = Seq("a"), joinType = "fullouter")
506+
val equivalentQueries = Seq(
507+
joinDf.select($"*"),
508+
joinDf.as("r").select($"*"),
509+
joinDf.as("r").select($"r.*")
510+
)
511+
equivalentQueries.foreach { query =>
512+
checkAnswer(query,
513+
Seq(
514+
Row(1, 1, null),
515+
Row(1, 2, null),
516+
Row(2, 1, 2),
517+
Row(2, 2, 2),
518+
Row(3, 1, null),
519+
Row(3, 2, null)
520+
)
521+
)
522+
}
523+
}
502524
}

0 commit comments

Comments
 (0)