-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: optimize some bit functions #718
Conversation
It would be nice to have a criterion benchmark as part of the PR |
The baseline of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are some nice speedups. Thanks @kazuyukitanimura. I'm not too familiar with this code so it would be good to get another review before merging (perhaps @viirya or @parthchandra?)
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #718 +/- ##
=============================================
+ Coverage 33.57% 53.75% +20.17%
+ Complexity 830 813 -17
=============================================
Files 110 107 -3
Lines 42608 10273 -32335
Branches 9352 1934 -7418
=============================================
- Hits 14306 5522 -8784
+ Misses 25347 3774 -21573
+ Partials 2955 977 -1978 ☔ View full report in Codecov by Sentry. |
native/core/src/common/bit.rs
Outdated
@@ -553,8 +564,11 @@ pub struct BitReader { | |||
/// either byte aligned or not. | |||
impl BitReader { | |||
pub fn new(buf: Buffer, len: usize) -> Self { | |||
let num_bytes = cmp::min(8, len); | |||
let buffered_values = read_num_bytes_u64(num_bytes, buf.as_slice()); | |||
let buffered_values = if 8 > len { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we use size_of instead of 8?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
let buffered_values = if 8 > len { | ||
read_num_bytes_u64(len, buf.as_slice()) | ||
} else { | ||
read_u64(buf.as_slice()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If len > 8
then this will not read len
bytes will it? Also read_num_bytes_u64
appears to cover this case in an unlikely
block (and handles it rather differently).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If len > 8
then, this will read 8 bytes only (u64).
read_num_bytes_u64
has
debug_assert!(size <= src.len());
if unlikely(src.len() < 8) {
So when size=8
we do not need the if unlikely()
block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can explicitly make the condition if src.len == 0
So
if src.len == 0 {
read_u64(buf.as_slice())
} else {
read_num_bytes_u64(len, buf.as_slice())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When if src.len == 0
is true, we cannot read anything if I understand correctly?
@parthchandra do you have any more feedbacks? |
group.bench_function("get_u32_value", |b| { | ||
b.iter(|| { | ||
let mut reader: BitReader = BitReader::new_all(buffer.slice(0)); | ||
for _ in 0..(buffer.len() * 8 / 31) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add some comments on magic numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no a magic number, I just chose this randomly, no specific reasons
#[inline] | ||
pub fn read_u64(src: &[u8]) -> u64 { | ||
let in_ptr = src.as_ptr() as *const u64; | ||
unsafe { in_ptr.read_unaligned() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curios why to read the raw pointer unaligned? its not guaranteed to be aligned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm not sure if it is always aligned...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can ever guarantee that the memory is aligned
@@ -1019,6 +1033,12 @@ mod tests { | |||
} | |||
} | |||
|
|||
#[test] | |||
fn test_read_u64() { | |||
let buffer: Vec<u8> = vec![0, 1, 2, 3, 4, 5, 6, 7]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it u8 if we test u64, should we create a vector of i64?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is a buffer. It reads 8 bytes from here
Plan to merge this today |
Merged, thank you @andygrove @parthchandra @comphead |
## Which issue does this PR close? Part of apache#679 and apache#670 ## Rationale for this change The improvement could be negligible in real use cases, but I see some improvements in micro benchmarks ## What changes are included in this PR? Optimizations in some bit functions ## How are these changes tested? Existing tests (cherry picked from commit ffb96c3)
Which issue does this PR close?
Part of #679 and #670
Rationale for this change
The improvement could be negligible in real use cases, but I see some improvements in micro benchmarks
What changes are included in this PR?
Optimizations in some bit functions
How are these changes tested?
Existing tests