-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-27279][SQL] Reuse subquery should compare child plan of SubqueryExec
#24214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return plan | ||
} | ||
// Build a hash map using schema of subqueries to avoid O(N*N) sameResult calls. | ||
val subqueries = mutable.HashMap[StructType, ArrayBuffer[SubqueryExec]]() | ||
plan transformAllExpressions { | ||
case sub: ExecSubqueryExpression => | ||
val sameSchema = subqueries.getOrElseUpdate(sub.plan.schema, ArrayBuffer[SubqueryExec]()) | ||
val sameResult = sameSchema.find(_.sameResult(sub.plan)) | ||
val sameResult = sameSchema.find(_.child.sameResult(sub.plan.child)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change fixes the bug you described in the PR description? Sorry, but I'm not clearly sure about what's the problem here, so could you show us an example query to reproduce the bug? Also, could you make the title more precise...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, you can see the example in the added test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure either about this fix. I remember this problem which I tried to fix in 36fa664. As you can see, the PR didn't actually do this fix as per @cloud-fan 's comment. So I am pinging him here for reviewing this too. Anyway, when I debugged it, the problem was that being the same object according to equals, the tree was not updated, so I see no reason for the current fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we just overwrite SubqueryExec.doCanonicalize
to strip it? It seems the SubqueryExec.name
is stopping us from match semantically same subqueries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated code as @cloud-fan suggests, thanks for all your reviews!
Test build #103941 has finished for PR 24214 at commit
|
retest this please. |
SubqueryExec
Test build #103952 has finished for PR 24214 at commit
|
retest this please. |
Test build #103961 has finished for PR 24214 at commit
|
@@ -889,6 +889,12 @@ object SQLConf { | |||
.booleanConf | |||
.createWithDefault(true) | |||
|
|||
val SUBQUERY_REUSE_ENABLED = buildConf("spark.sql.subquery.reuse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this config in this pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's indeed a separated thing, but I'm ok to include it in this PR, as it's obviously wrong to use exchangeReuseEnabled
in ReuseSubquery
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a PR separately to fix this problem before, #23998
@@ -674,6 +674,8 @@ case class SubqueryExec(name: String, child: SparkPlan) extends UnaryExecNode { | |||
|
|||
override def outputOrdering: Seq[SortOrder] = child.outputOrdering | |||
|
|||
override def doCanonicalize(): SparkPlan = child.canonicalized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#18169 fixed this problem like this before, but I don't know why was it reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gatorsmile may you please share if you remember why it was reverted? Thanks.
@@ -889,6 +889,12 @@ object SQLConf { | |||
.booleanConf | |||
.createWithDefault(true) | |||
|
|||
val SUBQUERY_REUSE_ENABLED = buildConf("spark.sql.subquery.reuse") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created a PR separately to fix this problem before, #23998
Test build #104002 has finished for PR 24214 at commit
|
Test build #104001 has finished for PR 24214 at commit
|
Hi @adrian-wang can you do a rebase? I just merged #23998 |
Test build #104006 has finished for PR 24214 at commit
|
@cloud-fan , rebase done and test passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks! Merged to master.
What changes were proposed in this pull request?
For now,
ReuseSubquery
in Spark compares two subqueries atSubqueryExec
level, which invalidates theReuseSubquery
rule. This pull request fixes this, and add a configuration key for subquery reuse exclusively.How was this patch tested?
add a unit test.