Skip to content

Conversation

psvri
Copy link
Contributor

@psvri psvri commented Mar 29, 2025

Which issue does this PR close?

Closes #.

Rationale for this change

Now that we have updated msrv, we can use rust built ins for round_upto_multiple_of_64 and div_ceil

What changes are included in this PR?

Are there any user-facing changes?

Yes, minor edits to error message

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 29, 2025
@psvri psvri changed the title Use rust builtins Use rust builtins for round_upto_multiple_of_64 and ceil Mar 29, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @psvri - I think we should run some benchmarks to ensure we aren't introducing some performance regressions by accident

@mbutrovich
Copy link
Contributor

mbutrovich commented Apr 1, 2025

Thanks @psvri - I think we should run some benchmarks to ensure we aren't introducing some performance regressions by accident

To investigate round_upto_multiple_of_64 I dumped the disassembly of MutableBuffer::shrink_to_fit...

pub fn shrink_to_fit(&mut self) {
    let new_capacity = bit_util::round_upto_multiple_of_64(self.len);
    if new_capacity < self.layout.size() {
        self.reallocate(new_capacity)
    }
}

...from a release build for main and this PR:

Online diff.

The generated instructions are slightly different, but general number of instructions, look the same. I would be stunned if you measured anything other than noise in a release build. I'll take a look at ceil next.

@mbutrovich
Copy link
Contributor

For ceil I disassembled MutableBuffer::new_null since all it does is use ceil and then allocate:

/// creates a new [MutableBuffer] with capacity and length capable of holding `len` bits.
/// This is useful to create a buffer for packed bitmaps.
pub fn new_null(len: usize) -> Self {
    let num_bytes = bit_util::ceil(len, 8);
    MutableBuffer::from_len_zeroed(num_bytes)
}

The generated code is the same between main this PR.

@mbutrovich
Copy link
Contributor

mbutrovich commented Apr 1, 2025

For ceil what might be interesting is to find somewhere where the compiler can't use a shift immediate because it's a compile-time constant power of 2 (like in new_null above). I'll poke around....

@mbutrovich
Copy link
Contributor

mbutrovich commented Apr 1, 2025

So, almost the entire codebase uses compile time constant args that are powers of two to ceil so those will generate equivalent shifts. The exception is arrow_row::variable::padded_length. That gets inlined in encoded_len, which in turn gets inlined a bunch of times in row_lengths. So let's dive into that!

Online diff.

Darn. Aside from label names, they're exactly the same. I really thought I'd find a difference. Long way of saying, there should not be measurable performance difference on this PR.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mbutrovich and @psvri for your contribution and analysis

@alamb alamb merged commit cdf72dd into apache:main Apr 3, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Apr 3, 2025

🚀 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants