-
Notifications
You must be signed in to change notification settings - Fork 28.6k
SPARK-1293 [SQL] Parquet support for nested types #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
48 commits
Select commit
Hold shift + click to select a range
aa688fe
Adding conversion of nested Parquet schemas
AndreSchumacher 4d4892a
First commit nested Parquet read converters
AndreSchumacher 6125c75
First working nested Parquet record input
AndreSchumacher 745a42b
Completing testcase for nested data (Addressbook(
AndreSchumacher ddb40d2
Extending tests for nested Parquet data
AndreSchumacher 1b1b3d6
Fixing one problem with nested arrays
AndreSchumacher 5d80461
fixing one problem with nested structs and breaking up files
AndreSchumacher 98219cf
added struct converter
AndreSchumacher ee70125
fixing one problem with arrayconverter
AndreSchumacher b7fcc35
Documenting conversions, bugfix, wrappers of Rows
AndreSchumacher 6dbc9b7
Fixing some problems intruduced during rebase
AndreSchumacher f8f8911
For primitive rows fall back to more efficient converter, code reorg
AndreSchumacher 4e25fcb
Adding resolution of complex ArrayTypes
AndreSchumacher a594aed
Scalastyle
AndreSchumacher b539fde
First commit for MapType
AndreSchumacher 824500c
Adding attribute resolution for MapType
AndreSchumacher f777b4b
Scalastyle
AndreSchumacher d1911dc
Simplifying ArrayType conversion
AndreSchumacher 1dc5ac9
First version of WriteSupport for nested types
AndreSchumacher e99cc51
Fixing nested WriteSupport and adding tests
AndreSchumacher adc1258
Optimizing imports
AndreSchumacher f466ff0
Added ParquetAvro tests and revised Array conversion
AndreSchumacher 79d81d5
Replacing field names for array and map in WriteSupport
AndreSchumacher 619c397
Completing Map testcase
AndreSchumacher c52ff2c
Adding native-array converter
AndreSchumacher 431f00f
Fixing problems introduced during rebase
AndreSchumacher a6b4f05
Cleaning up ArrayConverter, moving classTag to NativeType, adding Nat…
AndreSchumacher 0ae9376
Doc strings and simplifying ParquetConverter.scala
AndreSchumacher 32229c7
Removing Row nested values and placing by generic types
AndreSchumacher cbb5793
Code review feedback
AndreSchumacher 191bc0d
Changing to Seq for ArrayType, refactoring SQLParser for nested field…
AndreSchumacher 2f5a805
Removing stripMargin from test schemas
AndreSchumacher de02538
Cleaning up ParquetTestData
AndreSchumacher 31465d6
Scalastyle: fixing commented out bottom
AndreSchumacher 3c6b25f
Trying to reduce no-op changes wrt master
AndreSchumacher 3104886
Nested Rows should be Rows, not Seqs.
marmbrus f7aeba3
[SPARK-1982] Support for ByteType and ShortType.
marmbrus 3e1456c
WIP: Directly serialize catalyst attributes.
marmbrus 14c3fd8
Attempting to fix Spark-Parquet schema conversion
AndreSchumacher 37e0a0a
Cleaning up
AndreSchumacher 88e6bdb
Attempting to fix loss of schema
AndreSchumacher 63d1b57
Cleaning up and Scalastyle
AndreSchumacher b8a8b9a
More fixes to short and byte conversion
AndreSchumacher 403061f
Fixing some issues with tests and schema metadata
AndreSchumacher 94eea3a
Scalastyle
AndreSchumacher 7eceb67
Review feedback
AndreSchumacher 95c1367
Changes to ParquetRelation and its metadata
AndreSchumacher 30708c8
Taking out AvroParquet test for now to remove Avro dependency
AndreSchumacher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the semantics of
PrimitiveType
? Specifically, I'm surprised thatStringType
andDecimalType
are consideredPrimitiveTypes
. Also I wonder if we can unify this withNativeType
somehow. I'm not really sure, but I'd like to avoid too much explosion here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marmbrus
PrimitiveType
is maybe a misnomer. It's the same term that Parquet uses. Basically aPrimitiveType
is a type that is not contained inside another type (so non-nested). You can argue that a String is a Char array and therefore not primitive but in terms of constructing nested rows it means that a primitive type is a leaf inside the tree that produces a record.It would help to somehow distinguish between nested and non-nested types.
NativeType
comes close but for example there isBinaryType
which is primitive but not native.