-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-44287][SQL] Use PartitionEvaluator API in RowToColumnarExec & ColumnarToRowExec SQL operators. #41839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sql/core/src/test/scala/org/apache/spark/sql/SparkSessionExtensionSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/execution/SparkPlanSuite.scala
Outdated
Show resolved
Hide resolved
a267a1a
to
d17712f
Compare
} else { | ||
child.executeColumnar().mapPartitionsInternal { batches => | ||
val evaluator = evaluatorFactory.createEvaluator() | ||
evaluator.eval(0, batches) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need pass real partition index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original code, the index was not used as mapPartitionsInternal
is called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right. Even if it's not used for now, we should still set it correctly to be future-proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fixing it at #42185
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan , Thanks for the fix,
I can fix similar issue in other merged PR
https://github.com/apache/spark/pull/42024/files
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarEvaluatorFactory.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarEvaluatorFactory.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarEvaluatorFactory.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarEvaluatorFactory.scala
Show resolved
Hide resolved
d17712f
to
2fcb311
Compare
2fcb311
to
8908f9c
Compare
thanks, merging to master! |
### What changes were proposed in this pull request? This is a followup of #41839, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup. ### Why are the changes needed? future-proof ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #42185 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This is a followup of #41839, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup. ### Why are the changes needed? future-proof ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #42185 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit bf1bbc5) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This is a followup of #41839 , to fix an unintentional change. That PR added an optimization to return an empty iterator directly if the input iterator is empty. However, checking `inputIterator.hasNext` may trigger query execution, which is different than before. It should be completely lazy and wait for the root operator's iterator to trigger the execution. ### Why are the changes needed? fix unintentional change ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #42226 from cloud-fan/fo. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This is a followup of #41839 , to fix an unintentional change. That PR added an optimization to return an empty iterator directly if the input iterator is empty. However, checking `inputIterator.hasNext` may trigger query execution, which is different than before. It should be completely lazy and wait for the root operator's iterator to trigger the execution. ### Why are the changes needed? fix unintentional change ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes #42226 from cloud-fan/fo. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 0f9cca5) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
…ColumnarToRowExec SQL operators ### What changes were proposed in this pull request? SQL operators `RowToColumnarExec` & `ColumnarToRowExec` are updated to use the `PartitionEvaluator` API to do execution. ### Why are the changes needed? To avoid the use of lambda during distributed execution. Ref: SPARK-43061 for more details. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Updated 2 test cases, once all the SQL operators are migrated, the flag `spark.sql.execution.useTaskEvaluator` will be enabled by default to avoid running the tests with and without this TaskEvaluator Closes apache#41839 from vinodkc/br_refactorToEvaluatorFactory1. Authored-by: Vinod KC <vinod.kc.in@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This is a followup of apache#41839, to set the partition index correctly even if it's not used for now. It also contains a few code cleanup. ### Why are the changes needed? future-proof ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes apache#42185 from cloud-fan/follow. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This is a followup of apache#41839 , to fix an unintentional change. That PR added an optimization to return an empty iterator directly if the input iterator is empty. However, checking `inputIterator.hasNext` may trigger query execution, which is different than before. It should be completely lazy and wait for the root operator's iterator to trigger the execution. ### Why are the changes needed? fix unintentional change ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? existing tests Closes apache#42226 from cloud-fan/fo. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
SQL operators
RowToColumnarExec
&ColumnarToRowExec
are updated to use thePartitionEvaluator
API to do execution.Why are the changes needed?
To avoid the use of lambda during distributed execution.
Ref: SPARK-43061 for more details.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Updated 2 test cases, once all the SQL operators are migrated, the flag
spark.sql.execution.useTaskEvaluator
will be enabled by default to avoid running the tests with and without this TaskEvaluator