Skip to content

[SPARK-13530][SQL] Add ShortType support to UnsafeRowParquetRecordReader #11412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Feb 27, 2016

JIRA: https://issues.apache.org/jira/browse/SPARK-13530

What changes were proposed in this pull request?

By enabling vectorized parquet scanner by default, the unit test ParquetHadoopFsRelationSuite based on HadoopFsRelationTest will be failed due to the lack of short type support in UnsafeRowParquetRecordReader. We should fix it.

The error exception:

[info] ParquetHadoopFsRelationSuite:
[info] - test all data types - StringType (499 milliseconds)
[info] - test all data types - BinaryType (447 milliseconds)
[info] - test all data types - BooleanType (520 milliseconds)
[info] - test all data types - ByteType (418 milliseconds)
00:22:58.920 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in stage 124.0 (TID 1949)
org.apache.commons.lang.NotImplementedException: Unimplemented type: ShortType
at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.readIntBatch(UnsafeRowParquetRecordReader.java:769)
at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.readBatch(UnsafeRowParquetRecordReader.java:640)
at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader$ColumnReader.access$000(UnsafeRowParquetRecordReader.java:461)
at org.apache.spark.sql.execution.datasources.parquet.UnsafeRowParquetRecordReader.nextBatch(UnsafeRowParquetRecordReader.java:224)

How was this patch tested?

The unit test ParquetHadoopFsRelationSuite based on HadoopFsRelationTest will be failed due to the lack of short type support in UnsafeRowParquetRecordReader. By adding this support, the test can be passed.

@SparkQA
Copy link

SparkQA commented Feb 27, 2016

Test build #52117 has finished for PR 11412 at commit e923f7d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Feb 27, 2016

cc @nongli @rxin

@nongli
Copy link
Contributor

nongli commented Feb 27, 2016

lgtm

@rxin
Copy link
Contributor

rxin commented Feb 27, 2016

Thanks - merging this in master. It would've been better if we could have a unit test for this module, rather than relying on some integration tests.

@asfgit asfgit closed this in 3814d0b Feb 27, 2016
@JoshRosen
Copy link
Contributor

@viirya
Copy link
Member Author

viirya commented Feb 27, 2016

@JoshRosen looks like at the same module but a different problem. I will look at it.

@nongli
Copy link
Contributor

nongli commented Feb 28, 2016

@viirya I fixed it with this patch:
#11414

@viirya
Copy link
Member Author

viirya commented Feb 28, 2016

@nongli Got it. Thanks!

@viirya viirya deleted the add-shorttype-support branch December 27, 2023 18:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants