-
Couldn't load subscription status.
- Fork 1k
Fuzz test different parquet encodings #1156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let encodings = &[ | ||
| Encoding::PLAIN, | ||
| Encoding::RLE_DICTIONARY, | ||
| //Encoding::DELTA_LENGTH_BYTE_ARRAY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to not be supported at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that came in with the original array reader in #384 from @yordan-pavlov
Do you think it would be a good exercise to support DELTA_LENGTH_BYTE_ARRAY in the reader? If so, I can file a ticket and see if anyone else is interested in picking it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1082 will add support for it
36a70db to
96a2d4a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
| ConvertedType::NONE, | ||
| None, | ||
| &converter, | ||
| &[Encoding::PLAIN, Encoding::RLE_DICTIONARY], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the reason not to include DELTA_BINARY_PACKED here that that encoding is not supported for FixedLengthByteArrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was at least my interpretation of https://github.com/apache/parquet-format/blob/master/Encodings.md
FIXED_LEN_BYTE_ARRAY only seems support PLAIN and dictionary encoding
| let encodings = &[ | ||
| Encoding::PLAIN, | ||
| Encoding::RLE_DICTIONARY, | ||
| //Encoding::DELTA_LENGTH_BYTE_ARRAY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like that came in with the original array reader in #384 from @yordan-pavlov
Do you think it would be a good exercise to support DELTA_LENGTH_BYTE_ARRAY in the reader? If so, I can file a ticket and see if anyone else is interested in picking it up
| max_data_page_size: 1024 * 1024, | ||
| max_dict_page_size: 1024 * 1024, | ||
| writer_version: WriterVersion::PARQUET_1_0, | ||
| ..Default::default() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I double checked that these are the same values as in impl Default for TestOptions 👍
Which issue does this PR close?
Relates to #1053
Rationale for this change
Improved test coverage
What changes are included in this PR?
#1110 extended the fuzz tests to support different array types, nulls and multiple page row groups. This builds on this to also test different encodings
Are there any user-facing changes?
No, this PR only adds more tests