Skip to content

Conversation

@tustvold
Copy link
Contributor

@tustvold tustvold commented Feb 7, 2022

Which issue does this PR close?

Closes #1282.

Rationale for this change

I think it makes the logic easier to follow, although this may be a subjective assessment

What changes are included in this PR?

Refactors BitReader to explicitly separate the aligned and unaligned operations

Are there any user-facing changes?

No

@github-actions github-actions bot added the parquet Changes to the parquet crate label Feb 7, 2022

// First align bit offset to byte offset
if let BitReaderState::Unaligned(unaligned) = &mut self.state {
while values_to_read != values_read && (unaligned.bit_offset % 8) != 0 {
Copy link
Contributor Author

@tustvold tustvold Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a slight change here - we align to the byte not u64. I think this was unintentional, as the comment said byte offset and there is no reason to align to u64.

let in_buf = &self.buffer.data()[self.byte_offset..];
assert!(in_buf.len() * 8 >= to_read * bit_width);

let mut in_ptr = in_buf as *const [u8] as *const u8 as *const u32;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is simply moved, with some slightly more verbose naming

@codecov-commenter
Copy link

Codecov Report

Merging #1283 (9b456fa) into master (ce15d0c) will increase coverage by 0.00%.
The diff coverage is 94.59%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1283   +/-   ##
=======================================
  Coverage   83.02%   83.02%           
=======================================
  Files         180      180           
  Lines       52269    52306   +37     
=======================================
+ Hits        43394    43427   +33     
- Misses       8875     8879    +4     
Impacted Files Coverage Δ
parquet/src/util/bit_util.rs 93.10% <94.59%> (-0.09%) ⬇️
arrow/src/datatypes/field.rs 53.79% <0.00%> (-0.31%) ⬇️
arrow/src/array/transform/mod.rs 84.51% <0.00%> (-0.13%) ⬇️
parquet_derive/src/parquet_field.rs 66.21% <0.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ce15d0c...9b456fa. Read the comment docs.

Copy link
Member

@sunchao sunchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks correct, just not sure how much extra costs as_aligned and as_unaligned will incur.

///
/// Returns `None` if there's not enough data available. `Some` otherwise.
pub fn get_value<T: FromBytes>(&mut self, num_bits: usize) -> Option<T> {
self.state.as_unaligned().get(num_bits)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether there's any performance implication here since it now requires some extra compare & jump instructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the benchmarks if anything I see a teeny performance improvement, this is likely because whilst there is a branch here, it isn't typically called in a loop.

Although DeltaBitPackDecoder is an exception here, but that's what #1284 is for 😄

@tustvold
Copy link
Contributor Author

tustvold commented Feb 8, 2022

Going to get #1284 in first and then revisit this

@tustvold tustvold marked this pull request as draft February 8, 2022 10:07
@tustvold
Copy link
Contributor Author

Going to close this as it wasn't needed for #1284

@tustvold tustvold closed this Feb 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

(WON'T FIX) Don't Interwine Bit and Byte Aligned Operations in BitReader

3 participants