-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
upgrade rapidhash v1 to v3 and fix several bugs in _hash_integer translation #59177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Also, how do the benchmarks look? |
|
(the hash changes aren't in 1.12, right? I thought it was 1.13 so probably shouldn't be backported) thanks for catching my errors in the Integer translation. I'm happy to help out here for the rest of the updates |
If this fixes bugs (which bugs?) can we have regression tests, too? |
|
@giordano the one I see fixed here (although it sounds like Jameson has more in mind that I don't see yet) is that I used this would become an issue on 32-bit systems to hashes are allowed to differ on 32-bit systems, of course as they must, but if there is some that is all to say a regression test might be annoying to add since it requires several components that don't exist in |
|
Test failure looks likely related |
|
Yes, looks like TOML included a test for Dict iteration order which can just be edited |
The rapidhash v3 algorithm supports streaming data, which makes it much more suitable for use as a string hasher than the original. This is actually v3 nano, since it is more similar to the original v1, but without as much vectorization performance improvements v3. It also fixes several bugs in the translation to hashing integers. Initial version written by Claude, with full review and correction by myself afterwards (it caught none of the bugs).
10ae734 to
9eac4bb
Compare
For some reason master (EDIT: maybe just a build issue on my machine) allocates 2 objects and this PR doesn't, so this is much faster, though not obvious why that was true. |
|
It doesn't allocate for me: julia> using BenchmarkTools, Random
julia> a = randstring(10)
"xvj7uv5vw7"
julia> @btime hash($a)
3.958 ns (0 allocations: 0 bytes)
0x5128867387b8afff
julia> versioninfo()
Julia Version 1.13.0-DEV.940
Commit 8ba3b11f9b* (2025-08-01 07:41 UTC)
Platform Info:
OS: macOS (arm64-apple-darwin24.5.0)
CPU: 8 × Apple M1
WORD_SIZE: 64
LLVM: libLLVM-20.1.2 (ORCJIT, apple-m1)
GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores) |
This comment was marked as outdated.
This comment was marked as outdated.
|
the rapidhash changes are coming in 1.13. hashes are not changed in 1.12 at all |
The rapidhash v3 algorithm supports streaming data, which makes it much more suitable for use as a string hasher than the original. This is actually v3 nano, since it is more similar to the original v1, but without as much vectorization performance improvements v3. This is intended to be the first in a sequence of several more PRs to improve upon this code to be usable in more generic situations.
It also fixes several bugs in the translation to hashing integers.
Initial version written by Claude, with full review and correction by myself afterwards (it caught none of the bugs, but got most of the updates right).