[SPARK-14578] [SQL] Fix codegen for CreateExternalRow with nested wide schema#12338
Closed
davies wants to merge 2 commits intoapache:masterfrom
Closed
[SPARK-14578] [SQL] Fix codegen for CreateExternalRow with nested wide schema#12338davies wants to merge 2 commits intoapache:masterfrom
davies wants to merge 2 commits intoapache:masterfrom
Conversation
Contributor
Author
|
Test build #55651 has finished for PR 12338 at commit
|
Contributor
|
LGTM |
Contributor
|
One more question, actually there are some more places that we use local variables as the input for expressions, e.g. |
Contributor
Author
|
This is used for non-whole-stage codegen, also MapObjects does not support interpret mode, or it's better to fallback. Class member may prevent JIT compiler to do more optimization, so we should not aggresively put all of them as class members. Merging this into master. |
asfgit
pushed a commit
that referenced
this pull request
Apr 19, 2016
… type ## What changes were proposed in this pull request? After #12067, we now use expressions to do the aggregation in `TypedAggregateExpression`. To implement buffer merge, we produce a new buffer deserializer expression by replacing `AttributeReference` with right-side buffer attribute, like other `DeclarativeAggregate`s do, and finally combine the left and right buffer deserializer with `Invoke`. However, after #12338, we will add loop variable to class members when codegen `MapObjects`. If the `Aggregator` buffer type is `Seq`, which is implemented by `MapObjects` expression, we will add the same loop variable to class members twice(by left and right buffer deserializer), which cause the `ClassFormatError`. This PR fixes this issue by calling `distinct` before declare the class menbers. ## How was this patch tested? new regression test in `DatasetAggregatorSuite` Author: Wenchen Fan <wenchen@databricks.com> Closes #12468 from cloud-fan/bug.
kapilsingh5050
pushed a commit
to kapilsingh5050/spark
that referenced
this pull request
Dec 12, 2016
…with nested wide schema The wide schema, the expression of fields will be splitted into multiple functions, but the variable for loopVar can't be accessed in splitted functions, this PR change them as class member. Added regression test. Author: Davies Liu <davies@databricks.com> Closes apache#12338 from davies/nested_row. Conflicts: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
The wide schema, the expression of fields will be splitted into multiple functions, but the variable for loopVar can't be accessed in splitted functions, this PR change them as class member.
How was this patch tested?
Added regression test.