-
Couldn't load subscription status.
- Fork 3.9k
ARROW-4490: [Rust] Add explicit SIMD vectorization for boolean ops in "array_ops" #3641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rust/arrow/src/mod.rs
Outdated
| pub mod bitmap; | ||
| pub mod buffer; | ||
| pub mod builder; | ||
| pub mod compute; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chao mentioned that this isn't necessary as we have lib.rs, I've removed it in #3624
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wasn't sure about this as I remembered @sunchao's previous comment. We have a lib.rs and a mod.rs what's the use case for each? For instance, why is it a good idea to have csv in both and not compute (not pushing for compute to be added but rather wondering what the thinking should be here)?
I tried searching for answers to this but couldn't find anything definitive...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib is required, and I think mod is optional if you prefer declaring top-level modules separately to lib. As we declare pub mod array and others in lib, mod becomes redundant.
I think modules that are only declared in lib were an oversight by their author(s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it looks like mod.rs can be removed. It's not clear to me that it actually gets compiled because no other files reference it, lib.rs references the other modules directly. datafusion and parquet do not have a mod.rs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good @paddyhoran
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! Nice improvements to error handling as well. Thanks @paddyhoran
This PR adds explicit SIMD for boolean ops,
and,orandnot.I moved
array_opsinto the newcomputemodule. From the outside this module serves the same purpose as the previousarray_opsmodule (all kernels will be accessible from this namespace) and the remainingarray_opsimplementations are exposed via thecomputemodule currently. As I add explicit SIMD for more kernels they will migrate fromarray_opsinto their own modules undercompute. I am keeping sub-modules undercompute(as apposed to compute.rs) as SIMD can get rather verbose and it seems thatcomputemay expand in the future.I have included benchmarks where I re-create the old default implementations for others to take a look at the speed improvement. It's not clear whether we need the non-SIMD versions in the benchmarks long term but I left them in for now to make the non-SIMD/SIMD comparison.
There are likely more optimizations possible (processing the values and null bit buffers in a single loop for instance) but I wanted to get the cleanest impl first and add further optimizations later if needed.