-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Orc: Support row group bloom filters #5313
Conversation
cc @rdblue @RussellSpitzer @huaxingao @kbendick Please review it, thanks a lot. |
For the test, shall we also check the read path to make sure bloom filters are there? |
I find the ORC SDK will auto recognize the bloom filter in the following code. Is it necessary to test the logic of ORC itself? IMO, we don't have to test it. There have been some tests in the ORC project. @huaxingao |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, outside of the two nits on the config values.
Thank you for working on this @deadwind4!
Thank a lot @huaxingao @kbendick for reviewing this. |
@kbendick Could we merge it? |
@deadwind4: I understand, that the ORC supports these filters and tests the functions, but I would still like to see some tests validating, that the filters are there. I could envision a situation where ORC changes the way to support bloom filters, and our tests fails to recognize this. Thanks, Peter |
@deadwind4: Could you please check the failed tests. Thanks, |
@pvary All checks have passed after I rebased the code and reran the CI. |
Thanks @deadwind4 for the PR and @kbendick for the review! |
ORC_BLOOM_FILTER_COLUMNS this property will work on spark? when i set write.orc.bloom.filter.columns=xx and used spark to write data, i found that bloomfilter had no effect on querying. |
Issue link: #5218
Add
write.orc.bloom.filter.columns
andwrite.orc.bloom.filter.fpp
options.Enable these options in ORC class.