Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: optimize some bit functions #718

Merged
merged 21 commits into from
Aug 2, 2024
Merged

fix: optimize some bit functions #718

merged 21 commits into from
Aug 2, 2024

Conversation

kazuyukitanimura
Copy link
Contributor

@kazuyukitanimura kazuyukitanimura commented Jul 24, 2024

Which issue does this PR close?

Part of #679 and #670

Rationale for this change

The improvement could be negligible in real use cases, but I see some improvements in micro benchmarks

What changes are included in this PR?

Optimizations in some bit functions

How are these changes tested?

Existing tests

@kazuyukitanimura kazuyukitanimura marked this pull request as ready for review July 24, 2024 19:47
@andygrove
Copy link
Member

It would be nice to have a criterion benchmark as part of the PR

@kazuyukitanimura kazuyukitanimura marked this pull request as draft July 25, 2024 00:32
@kazuyukitanimura
Copy link
Contributor Author

kazuyukitanimura commented Jul 27, 2024

@andygrove

bit_util/get_value/i32_num_bits_8
                        time:   [781.53 ps 797.98 ps 818.12 ps]
                        change: [-14.395% -11.959% -9.4057%] (p = 0.00 < 0.05)
                        Performance has improved.
bit_util/get_value/i32_num_bits_32
                        time:   [599.30 ps 605.37 ps 613.10 ps]
                        change: [-21.472% -14.798% -7.3131%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high mild

bit_util/trailing_bits/num_bits_0
                        time:   [476.87 ps 479.39 ps 482.48 ps]
                        change: [+0.5038% +0.9142% +1.4061%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) high mild
  11 (11.00%) high severe
bit_util/trailing_bits/num_bits_32
                        time:   [537.73 ps 554.77 ps 576.77 ps]
                        change: [-20.179% -14.236% -7.9038%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) high mild
  11 (11.00%) high severe
bit_util/trailing_bits/num_bits_64
                        time:   [531.20 ps 537.39 ps 545.31 ps]
                        change: [-12.720% -9.5177% -6.6237%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe
bit_util/read_u64       time:   [662.74 ps 667.79 ps 675.09 ps]
                        change: [-44.346% -43.866% -43.140%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
bit_util/read_u32       time:   [640.71 ps 641.86 ps 643.37 ps]
                        change: [-84.385% -84.321% -84.263%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  4 (4.00%) high mild
  11 (11.00%) high severe
bit_util/get_u32_value  time:   [2.0853 ms 2.0860 ms 2.0868 ms]
                        change: [-28.295% -28.107% -27.945%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

The baseline of read_u64(...)/read_u32(...) are read_num_bytes_u64(8, ...)/read_num_bytes_u32(4, ...)

@kazuyukitanimura kazuyukitanimura marked this pull request as ready for review July 27, 2024 06:37
Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are some nice speedups. Thanks @kazuyukitanimura. I'm not too familiar with this code so it would be good to get another review before merging (perhaps @viirya or @parthchandra?)

@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 53.75%. Comparing base (ded3dd6) to head (30adf06).
Report is 10 commits behind head on main.

Additional details and impacted files
@@              Coverage Diff              @@
##               main     #718       +/-   ##
=============================================
+ Coverage     33.57%   53.75%   +20.17%     
+ Complexity      830      813       -17     
=============================================
  Files           110      107        -3     
  Lines         42608    10273    -32335     
  Branches       9352     1934     -7418     
=============================================
- Hits          14306     5522     -8784     
+ Misses        25347     3774    -21573     
+ Partials       2955      977     -1978     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -553,8 +564,11 @@ pub struct BitReader {
/// either byte aligned or not.
impl BitReader {
pub fn new(buf: Buffer, len: usize) -> Self {
let num_bytes = cmp::min(8, len);
let buffered_values = read_num_bytes_u64(num_bytes, buf.as_slice());
let buffered_values = if 8 > len {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we use size_of instead of 8?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

let buffered_values = if 8 > len {
read_num_bytes_u64(len, buf.as_slice())
} else {
read_u64(buf.as_slice())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If len > 8 then this will not read len bytes will it? Also read_num_bytes_u64 appears to cover this case in an unlikely block (and handles it rather differently).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If len > 8 then, this will read 8 bytes only (u64).
read_num_bytes_u64 has

    debug_assert!(size <= src.len());
    if unlikely(src.len() < 8) {

So when size=8 we do not need the if unlikely() block

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can explicitly make the condition if src.len == 0
So

   if src.len == 0 {
     read_u64(buf.as_slice())
   } else {
     read_num_bytes_u64(len, buf.as_slice())
   }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When if src.len == 0 is true, we cannot read anything if I understand correctly?

@kazuyukitanimura
Copy link
Contributor Author

@parthchandra do you have any more feedbacks?
cc @comphead @huaxingao @viirya

group.bench_function("get_u32_value", |b| {
b.iter(|| {
let mut reader: BitReader = BitReader::new_all(buffer.slice(0));
for _ in 0..(buffer.len() * 8 / 31) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add some comments on magic numbers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no a magic number, I just chose this randomly, no specific reasons

#[inline]
pub fn read_u64(src: &[u8]) -> u64 {
let in_ptr = src.as_ptr() as *const u64;
unsafe { in_ptr.read_unaligned() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curios why to read the raw pointer unaligned? its not guaranteed to be aligned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not sure if it is always aligned...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can ever guarantee that the memory is aligned

@@ -1019,6 +1033,12 @@ mod tests {
}
}

#[test]
fn test_read_u64() {
let buffer: Vec<u8> = vec![0, 1, 2, 3, 4, 5, 6, 7];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it u8 if we test u64, should we create a vector of i64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a buffer. It reads 8 bytes from here

@kazuyukitanimura
Copy link
Contributor Author

Plan to merge this today

@kazuyukitanimura kazuyukitanimura merged commit ffb96c3 into main Aug 2, 2024
183 checks passed
@kazuyukitanimura
Copy link
Contributor Author

Merged, thank you @andygrove @parthchandra @comphead

@kazuyukitanimura kazuyukitanimura deleted the optimize-get-value branch August 9, 2024 20:36
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
## Which issue does this PR close?

Part of apache#679 and apache#670

## Rationale for this change

The improvement could be negligible in real use cases, but I see some improvements in micro benchmarks

## What changes are included in this PR?

Optimizations in some bit functions

## How are these changes tested?

Existing tests

(cherry picked from commit ffb96c3)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants