Infer parquet reader type based on file metadata#9294
Infer parquet reader type based on file metadata#9294Jackie-Jiang merged 3 commits intoapache:masterfrom
Conversation
|
Can you add a sample data file with a decimal field and a test to ensure the file is correctly parsed? |
There was a problem hiding this comment.
Might throw null pointer exception here if fileKeyValueMeta is null.
There was a problem hiding this comment.
You can put this check hasAvroSchemaInParquetFile() inside org.apache.pinot.plugin.inputformat.parquet.ParquetUtils and reuse the same method inside org.apache.pinot.plugin.inputformat.parquet.ParquetUtils.getParquetAvroSchema(...)
0191412 to
cae4e5e
Compare
Codecov Report
@@ Coverage Diff @@
## master #9294 +/- ##
============================================
- Coverage 69.82% 67.04% -2.79%
- Complexity 4696 4824 +128
============================================
Files 1873 1391 -482
Lines 99623 72184 -27439
Branches 15146 11583 -3563
============================================
- Hits 69564 48396 -21168
+ Misses 25118 20266 -4852
+ Partials 4941 3522 -1419
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
|
The test failure might be related: |
@Jackie-Jiang ACK. Fixed the test and added a new test to validate file metadata based reader selection |
|
Can you please modify the PR description to include the new config key for the record reader config? Also update the Pinot doc where applicable |
This PR allows parquet reader to automatically decide b/w
ParquetAvroRecordReaderandParquetNativeRecordReaderbased on the parquet file's metadata.The reader config can be used to enforce the reader, but the default behaviour is to infer the type based on file schema
Reader config flags
setUseParquetNativeRecordReader(true)-> UseParquetNativeRecordReadersetUseParquetAvroRecordReader(true)-> UseParquetAvroRecordReaderdefault -> Infer reader type based on file metadata