Skip to content

Conversation

@vtjnash
Copy link
Member

@vtjnash vtjnash commented Jul 31, 2025

The rapidhash v3 algorithm supports streaming data, which makes it much more suitable for use as a string hasher than the original. This is actually v3 nano, since it is more similar to the original v1, but without as much vectorization performance improvements v3. This is intended to be the first in a sequence of several more PRs to improve upon this code to be usable in more generic situations.

It also fixes several bugs in the translation to hashing integers.

Initial version written by Claude, with full review and correction by myself afterwards (it caught none of the bugs, but got most of the updates right).

@vtjnash vtjnash added hashing backport 1.12 Change should be backported to release-1.12 labels Jul 31, 2025
@oscardssmith
Copy link
Member

@vtjnash while you're at it, can you address #59172 and drop the length call?

@oscardssmith
Copy link
Member

Also, how do the benchmarks look?

@adienes
Copy link
Member

adienes commented Jul 31, 2025

(the hash changes aren't in 1.12, right? I thought it was 1.13 so probably shouldn't be backported)

thanks for catching my errors in the Integer translation. I'm happy to help out here for the rest of the updates

@KristofferC
Copy link
Member

KristofferC commented Jul 31, 2025

@vtjnash while you're at it, can you address #59172 and drop the length call?

This is already done here, or?

@giordano
Copy link
Member

It also fixes several bugs in the translation to hashing integers.

If this fixes bugs (which bugs?) can we have regression tests, too?

@adienes
Copy link
Member

adienes commented Jul 31, 2025

@giordano the one I see fixed here (although it sounds like Jameson has more in mind that I don't see yet) is that I used % UInt where I should have used % UInt64

this would become an issue on 32-bit systems to <: Integer types of > 128 bits that end up calling _hash_integer (notably, not including BigInt, but it would include things like BitIntegers.UInt256)

hashes are allowed to differ on 32-bit systems, of course as they must, but if there is some OtherIntegers.FastBigInt that implements a faster specialization to hash_integer, then the hashing/equality correspondence of BitIntegers.UInt256 with OtherIntegers.FastBigInt could be broken when UInt === UInt32. note that I chose not to use Base.BigInt for this example either since the hash_integer(::BigInt) specialization is only defined when UInt === UInt64

that is all to say a regression test might be annoying to add since it requires several components that don't exist in Base

@Keno
Copy link
Member

Keno commented Aug 1, 2025

Test failure looks likely related

@vtjnash
Copy link
Member Author

vtjnash commented Aug 1, 2025

Yes, looks like TOML included a test for Dict iteration order which can just be edited

The rapidhash v3 algorithm supports streaming data, which makes it much
more suitable for use as a string hasher than the original. This is
actually v3 nano, since it is more similar to the original v1, but
without as much vectorization performance improvements v3.

It also fixes several bugs in the translation to hashing integers.

Initial version written by Claude, with full review and correction by
myself afterwards (it caught none of the bugs).
@vtjnash vtjnash force-pushed the jn/rapidhash3-and-bugfixes branch from 10ae734 to 9eac4bb Compare August 1, 2025 16:28
@vtjnash
Copy link
Member Author

vtjnash commented Aug 1, 2025

Also, how do the benchmarks look?

For some reason master (EDIT: maybe just a build issue on my machine) allocates 2 objects and this PR doesn't, so this is much faster, though not obvious why that was true.

julia> using BenchmarkTools, Random

julia> a = randstring(10)
"HAW1abjB4Y"

julia> @btime hash($a)
  16.951 ns (2 allocations: 80 bytes)
0x0455bfdda16c6e82

@vtjnash vtjnash added the merge me PR is reviewed. Merge when all tests are passing label Aug 1, 2025
@giordano
Copy link
Member

giordano commented Aug 1, 2025

It doesn't allocate for me:

julia> using BenchmarkTools, Random

julia> a = randstring(10)
"xvj7uv5vw7"

julia> @btime hash($a)
  3.958 ns (0 allocations: 0 bytes)
0x5128867387b8afff

julia> versioninfo()
Julia Version 1.13.0-DEV.940
Commit 8ba3b11f9b* (2025-08-01 07:41 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin24.5.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LLVM: libLLVM-20.1.2 (ORCJIT, apple-m1)
  GC: Built with stock GC
Threads: 1 default, 1 interactive, 1 GC (on 4 virtual cores)

@PallHaraldsson

This comment was marked as outdated.

@adienes
Copy link
Member

adienes commented Aug 6, 2025

the rapidhash changes are coming in 1.13. hashes are not changed in 1.12 at all

@vtjnash vtjnash removed the backport 1.12 Change should be backported to release-1.12 label Aug 6, 2025
@vtjnash vtjnash merged commit d268106 into master Aug 7, 2025
7 of 9 checks passed
@vtjnash vtjnash deleted the jn/rapidhash3-and-bugfixes branch August 7, 2025 20:33
@giordano giordano removed the merge me PR is reviewed. Merge when all tests are passing label Aug 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants