[SPARK-52060][SQL] Make `OneRowRelationExec` node #50849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

richardc-db wants to merge 5 commits into apache:master from richardc-db:make_one_row_relation_node

+58 −4

Contributor

richardc-db commented May 9, 2025

What changes were proposed in this pull request?

creates a new OneRowRelationExec node, which is more or less a copy of the RDDScanExec node.

We want a dedicated node because this helps make it more clear when a one row relation, i.e. for patterns like SELECT version() is used.

Why are the changes needed?

this makes it more clear in the code that a one row relation is used and allows us to avoid checking the hard coded "OneRowRelation" string when pattern matching.

Does this PR introduce any user-facing change?

yes, the plan will now be OneRowRelationExec rather than RDDScanExec. The plan string should be the same, however.

How was this patch tested?

added UTs

Was this patch authored or co-authored using generative AI tooling?


          init

b52abdb

richardc-db changed the title ~~[SQL] Make OneRowRelationExec node~~ [SPARK-52060][SQL] Make OneRowRelationExec node

github-actions bot added the SQL label

cloud-fan reviewed

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

cloud-fan reviewed

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved

cloud-fan reviewed

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated

+                }
+                override def simpleString(maxFields: Int): String = {
+                  s"$nodeName${truncatedString(output, "[", ",", "]", maxFields)}"

Contributor

cloud-fan May 19, 2025

How is this different from the default implementation?

Contributor Author

richardc-db May 19, 2025

the default implementation returns Scan OneRowRelation, while the existing implementation (using RDDScan) returns Scan OneRowRelation[]. I figured we shouldn't change this in the off chance that someone is relying on it.

cloud-fan reviewed

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala Outdated Show resolved Hide resolved


          comments

richardc-db requested a review from cloud-fan

May 19, 2025 22:19

richardc-db added 3 commits

May 19, 2025 18:23


          output unsafe row

ac1eee8


          cleanup

dd9547a


          cleanup

597c7f5

Contributor Author

richardc-db commented May 20, 2025

there are failures like

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.errors.QueryExecutionErrorsSuite, threads: rpc-boss-1245-1 (daemon=true) =====


[info] org.apache.spark.sql.errors.QueryExecutionErrorsSuite *** ABORTED *** (37 milliseconds)
[info]   java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
[info]   at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:212)

which i also see in other PRs such as here, so i think the failures are unrelated

cloud-fan reviewed

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala

+                private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)
+                override lazy val metrics = Map(
+                  "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))

Contributor

cloud-fan May 20, 2025

it's always one row, do we need this metric?

cloud-fan reviewed

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala


		private val emptyRow: InternalRow = InternalRow.empty

		private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)

Contributor

cloud-fan May 20, 2025

I think we can do this

private val rdd = {
  val proj = UnsafeProjection.create(schema)
  val emptyRow = proj(InternalRow.empty)
  session.sparkContext.parallelize(Seq(emptyRow), 1)
}

Contributor

cloud-fan May 20, 2025

then def doExecute() can just return this RDD

cloud-fan reviewed

View reviewed changes

sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala


		override def inputRDD: RDD[InternalRow] = rdd

		override protected val createUnsafeProjection: Boolean = true

Contributor

cloud-fan May 20, 2025

If we do https://github.com/apache/spark/pull/50849/files#r2097521541 , then this can be false.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

SQL