-
Notifications
You must be signed in to change notification settings - Fork 1k
feat: add apply_unary_op and apply_binary_op bitwise operations
#8619
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add apply_unary_op and apply_binary_op bitwise operations
#8619
Conversation
…table. but I don't want to pass slice of bytes as then I don't know the source and users must make sure that they hold the same promises as Buffer/MutableBuffer
|
I will try and review this one tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rluvaton -- I haven't made it through this PR yet but the idea of optimized bitwise operations even for offset data is very compelling. The code is also very well tested and documented in my opinion. Thank you.
My primary concern is with the complexity of this code (including the unsafe) though your tests and documentation make it much easier to contemplate. I did have a few comments so far. I think with some more study I could find
Can you please share the benchmarks you are using / any WIP? I want to confirm the performance improvements before studying this code in more detail
FYI @tustvold and @crepererum and @jhorstmann if you are interested
| /// (e.g. `BooleanBufferBuilder`). | ||
| /// | ||
| /// ## Why this trait is needed, can't we just use `MutableBuffer` directly? | ||
| /// Sometimes we don't want to expose the inner `MutableBuffer` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this rationale. It seems to me that this code does expose the inner MutableBuffer for BooleanBufferBuilder (other code can modify the MutableBuffer) it just does so via a trait. I am not sure how that is different than just passing in mutable buffer directly
I wonder why you can't just pass &mut [u8] (aka pass in the mutable slices directly) as none of the APIs seem to change the length of the underlying buffers 🤔
if it is absolutely required to use a MutableBuffer directly from BooleanBufferBuilder perhaps we can make an unsafe API instead:
impl BooleanBufferBuilder {
/// returns a mutable reference to the buffer and length. Callers must ensure if they change the length
/// the buffer, they also update len
pub unsafe fn inner(&mut self) -> (&mut MutableBuffer, &mut usize) { ... }
}🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you see it exposing mutable buffer? It only expose the slice
And not passing bytes to be similar to buffer ops and to make sure that user understand they need to be bit packed but don't have strong opinions about the last thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you see it exposing mutable buffer? It only expose the slice
I was thinking of this code in particular, which seems to pass a MutableBuffer reference directly out of the BooleanBufferBuilder
impl MutableOpsBufferSupportedLhs for BooleanBufferBuilder {
fn inner_mutable_buffer(&mut self) -> &mut MutableBuffer {
&mut self.buffer
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but this is pub(crate) on purpose (documented on the trait level) to not expose it further than current crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is a proposal which I think is simpler and easier to understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought about adding unsafe function but did not want to change it.
anyway, modified to be like your function
| .map(|(l, r)| expected_op(*l, *r)) | ||
| .collect(); | ||
|
|
||
| super::mutable_bitwise_bin_op_helper( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a nice test
|
|
||
| let is_mutable_buffer_byte_aligned = left_bit_offset == 0; | ||
|
|
||
| if is_mutable_buffer_byte_aligned { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth special casing the case where both left_offset and right_offset are zero? In that case a simple loop that compared u64 by u64 is probably fastest (maybe even u128 🤔 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to be u128 we would want to change the function to get u128, but I wanted to keep similar API to Buffer ops.
do you think we should change it?
and the special case can be added later
You can change the code that I described |
|
I plan to spend more time studying this PR tomorrow morning with a fresh pair of eyes |
|
I am hoping to spend more time tomorrow reviewing this one carefully (specifically I want to use the new API and see some performance improvements) |
I tried this and for some reason it fails the tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @rluvaton -- this is a feature that has been sorely missing in arrow-rs for a long time (optimized mutable bitwise operations with shifts) ❤️ ❤️ ❤️
Specifically, I think it will also assist operations where a selection mask must be continually narrowed down (aka applying AND)
I went through the low level implementations carefully and they look pretty good to me -- it was very nicely structured and commented. -- so while it took me a while it was a pleasant task.
One thing I would really like to update is the public facing APIs -- namely avoid the new traits, and make the APIs safer to use.
- Move
mutable_bitwise_unary_op_helperto a method onMutableBuffer - Move
mutable_bitwise_bin_op_helperto a method onMutableBuffer - Remove the pub
mutable_...functions and move their implementations inBufferBuilder - Move tests to be in terms of the bitop impls in MutableBuffer
I sketched how this would look in this PR:
I am happy to push commits to this PR to do so, but I wanted to check with you first.
I also think we need to show some significant performance improvements to justify adding this level of complexity / unsafe to the arrow-crate. I tried to get one here
But it isn't passing tests yet 🤔
| ) where | ||
| F: FnMut(u64, u64) -> u64, | ||
| { | ||
| // Must not reach here if we not byte aligned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow on PR, it may be worth adding another special case for when the right is also byte aligned which I think will be a common case when applying operations in place on BooleanBuffers (I am looking ahead not just the current usecase)
| /// | ||
| /// This is the only place that uses unsafe code to read and write unaligned | ||
| /// | ||
| struct U64UnalignedSlice<'a> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another potential (future) optimization is to handle any unaligned bytes and then switch to aligned 64 operations and then clean up with unaligned bytes.
We would have to have some benchmarks to see if it was worthwhile
| // to preserve the bits outside the remainder | ||
| let rem = { | ||
| // 1. Read the byte that we will override | ||
| let current = start_remainder_mut_slice |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only reads 8 bits, but the comments (and logic) says that the remainder could be up to 63 bits (shouldn't this be reading multiple bytes if remainder_len is greater than 8? Perhaps via `get_remainder_bits 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment, the slice is always at the length of ceil(reminder_len, 8) to make it that the last byte is always the boundary byte
I am actively working on some changes to this PR (remove the bitwise ops, add some more docs). I will push commits in a few minutes |
…olean-buffer-builder
…m:rluvaton/arrow-rs into add-bitwise-ops-to-boolean-buffer-builder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again so much @rluvaton -- I went through this PR again very carefully and:
- Added more documentation and doc examples
- Added some tests for error/out of bound conditions
- Renamed the module to
mutable/bitwise.rs
I think it is now ready to go
I will also file some follow on tickets to track
- Adding a
apply_unary_opmethod to BooleanBufferBuilder - Possibly adding something similar to BooleanArray
BooleanBufferBuilder and for MutableBufferMutableBuffer::apply_unary_op and MutableBuffer::apply_binary_op
|
On the walk home, I was thinking we could almost make these two free functions in |
Fine by me, do you wanna push to this PR? |
Yeah, I'll give it a try later tonight or tomorrow |
|
starting to mess with this one now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved all the code into bitwise_ops and I think we are heading in the right direction
And we can do all the same operations directly on MutableBuffer and BufferBuilder without having to explose MutableBuffer at all
I also verified with my favorite test that this code is well covered>
cargo llvm-cov test --html --lib -p arrow-buffer👍
I am going to do one final check that we can still use this API in this PR
But I am feeling really good about this PR now
MutableBuffer::apply_unary_op and MutableBuffer::apply_binary_opapply_unary_op and apply_binary_op bitwise operations
| /// | ||
| /// # Example: Modify buffer with offsets | ||
| /// ``` | ||
| /// # use arrow_buffer::MutableBuffer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this example shows how these methods can still be used to modify a MutableBuffer in place
But the functions are general and can work on any &mut[u8]
This also has the nice property that Rust prevents any mutation of the length or capacity so we don't need to assert anymore
|
The benchmarks on #8619 are (still) looking quite good |


Which issue does this PR close?
BooleanBufferBuilderandMutableBufferthat mutate directly the buffer #8618.Rationale for this change
Implement efficient boolean by applying them a (u64) at a time
What changes are included in this PR?
Implementation notes
Are these changes tested?
Yes, although I did not run them on big endian machine
Are there any user-facing changes?
Yes, new functions which are documented
Notes: I will later change
BooleanBufferBuilder#append_packed_rangefunction to usemutable_bitwise_bin_op_helperas I saw that running theboolean_append_packedbenchmark improved by more than 2 timesSee benchmarks on