Skip to content

Conversation

@andygrove
Copy link
Member

@andygrove andygrove commented Dec 13, 2024

Which issue does this PR close?

N/A

Rationale for this change

There is a new version of twoxhash that has a new oneshot API for use cases where all data is available at once. This is more performant than the streaming API.

Arrow uses this crate for hashing related to bloom filters.

We saw significant performance improvements in Comet with this approach, so hopefully it helps with arrow as well.

I tried benchmarking with write_batch benchmarks but am seeing highly variable results on my desktop, so I am not sure how to prove that this helps with performance.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the parquet Changes to the parquet crate label Dec 13, 2024
@andygrove andygrove marked this pull request as ready for review December 13, 2024 17:27
@andygrove andygrove requested a review from viirya December 13, 2024 17:33
Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSRV check failed.

@andygrove
Copy link
Member Author

twoxhash defines rust-version as 1.81.0 even though it builds OK with much older versions. I will move this to draft for now.

FYI @shepmaster is it possible to lower the minimum rust version for twoxhash?

@andygrove andygrove marked this pull request as draft December 13, 2024 19:32
@shepmaster
Copy link
Contributor

See also #6583

even though it builds OK with much older versions

I'm not sure I follow:

cargo +1.80 test --all-features

# ...

error: could not compile `twox-hash` (lib) due to 12 previous errors

@andygrove
Copy link
Member Author

Closed since #6583 already exists. Thank you @shepmaster

@andygrove andygrove closed this Dec 14, 2024
@andygrove andygrove deleted the twox-hash-upgrade branch December 14, 2024 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants