Skip to content

Conversation

@oscardssmith
Copy link
Member

Reverts #59691. It was a 10x performance regression.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_hash_integer was found to have bugs, indicating it wasn't tested. This (despite the performance problems in the bitinteger package), ensures we are hashing them correctly by using the same code path.

@adienes
Copy link
Member

adienes commented Sep 30, 2025

as identified in #59177 (comment), the bug was

% UInt where I should have used % UInt64
this would become an issue on 32-bit systems to <: Integer types of > 128 bits that end up calling _hash_integer (notably, not including BigInt, but it would include things like BitIntegers.UInt256)

(unless there were more issues, but they weren't identified so I'm not aware of them yet), and this particular code path was indeed not tested, but I'm not sure I would agree it's fair to say it "wasn't tested" wholesale.

I do think this mistake is recoverable without such a catastrophic performance regression; indeed the behavior should be identical on all 64-bit systems.

I can see how it would make sense to keep a "safe" version in, especially if it unblocks #59151, and use the same code path for as many implementations as possible. I do suspect though that what's needed to fix the regression here will look like adding a hash_integer(::Integer) fast path, essentially reverting that part of #59691

@KristofferC
Copy link
Member

KristofferC commented Sep 30, 2025

Should revert since the performance regression was unexpected (or not mentioned), make a new PR and re-evaluate with a proper discussion of the trade-offs.

@vtjnash
Copy link
Member

vtjnash commented Sep 30, 2025

I partly see the issue with BitInteger, though not entirely clear how that triggers a 10x regression, I see locally that the new code also sometimes leads to 10x improvements over the old code too (it is value-dependent). The problem seems to be that BitInteger's implementation of >>> may be quite bad, and marked @inline, which causes cascading consistency problems.

vtjnash added a commit that referenced this pull request Oct 1, 2025
We cannot help that the BitInteger `>>>` method is very slow, but we can
vectorize this getindex, since that is why we have `load_le_array` in
the first place. This appears to give roughly equivalent performance on
random numbers and up to 10x faster performance in other cases.

Closes #59702
@vtjnash vtjnash deleted the revert-59691-jn/rapidhash-all-the-things branch October 1, 2025 03:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants