Skip to content

[SPARK-52060][SQL] Make OneRowRelationExec node #50849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

richardc-db
Copy link
Contributor

What changes were proposed in this pull request?

creates a new OneRowRelationExec node, which is more or less a copy of the RDDScanExec node.

We want a dedicated node because this helps make it more clear when a one row relation, i.e. for patterns like SELECT version() is used.

Why are the changes needed?

this makes it more clear in the code that a one row relation is used and allows us to avoid checking the hard coded "OneRowRelation" string when pattern matching.

Does this PR introduce any user-facing change?

yes, the plan will now be OneRowRelationExec rather than RDDScanExec. The plan string should be the same, however.

How was this patch tested?

added UTs

Was this patch authored or co-authored using generative AI tooling?

@richardc-db richardc-db changed the title [SQL] Make OneRowRelationExec node [SPARK-52060][SQL] Make OneRowRelationExec node May 9, 2025
@github-actions github-actions bot added the SQL label May 9, 2025
}

override def simpleString(maxFields: Int): String = {
s"$nodeName${truncatedString(output, "[", ",", "]", maxFields)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this different from the default implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default implementation returns Scan OneRowRelation, while the existing implementation (using RDDScan) returns Scan OneRowRelation[]. I figured we shouldn't change this in the off chance that someone is relying on it.

@richardc-db richardc-db requested a review from cloud-fan May 19, 2025 22:19
@richardc-db
Copy link
Contributor Author

there are failures like

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.errors.QueryExecutionErrorsSuite, threads: rpc-boss-1245-1 (daemon=true) =====


[info] org.apache.spark.sql.errors.QueryExecutionErrorsSuite *** ABORTED *** (37 milliseconds)
[info]   java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
[info]   at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:212)

which i also see in other PRs such as here, so i think the failures are unrelated

private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)

override lazy val metrics = Map(
"numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's always one row, do we need this metric?


private val emptyRow: InternalRow = InternalRow.empty

private val rdd = session.sparkContext.parallelize(Seq(emptyRow), 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can do this

private val rdd = {
  val proj = UnsafeProjection.create(schema)
  val emptyRow = proj(InternalRow.empty)
  session.sparkContext.parallelize(Seq(emptyRow), 1)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then def doExecute() can just return this RDD


override def inputRDD: RDD[InternalRow] = rdd

override protected val createUnsafeProjection: Boolean = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants