-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-18016][SQL][CATALYST] Code Generation: Constant Pool Limit #16648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
240d5a3
adding initial class splitting
75052c2
class_splitting mutable projections mutable state fix
0128e1e
class_splitting increasing memory during tests, fixing accessor refer…
fd9524d
class_splitting fixup on line length
8ce47fa
class_splitting import ordering and catalyst memory increase
5a4a39c
[class_splitting] adding more explicit mutableStateArray comments
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Accessor suffixes to variable names add quite a bit of noise in this PR. What value do they add from your perspective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having
addMutableState
return an accessor string is an important part of addressing the manner in which mutable state can contribute to Constant Pool errors. Code that creates mutable state usually takes for granted that the symbol used to declare the state will be inlined as a private member variable to the class. However, for sufficiently complicated schemas, mutable state and its initialization alone can breach the Constant Pool limit. The strategy I settled on was to have mutable state potentially be compacted into arrays of like type and initialization, this way, we can reduce the number of references that would count to the constant pool limit. Of course, if the mutable state is stored in an array, rather than in a private variable named after the symbol, we need to return back the accessor for that index in the compacted mutable state array, hence the 'accessor' suffixes. I had also tried a class-based approach, in which excess mutable state could become static members of nested classes, initialization functions for the state could still exceed the constant pool limit.This PR can be condensed to two core components to approach a solution to the (hard-and-fast) Constant Pool limit:
I should mention, not all mutable state is compacted into arrays. Only primitives and collections of simply-assigned objects (null assigned, or no assignment). But this array compaction strategy reduces references enough to allow even complex schemas in which we would potentially generate much more state than 2^16 to still be converted to datasets.