-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Faster AbstractArray hashing #39950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster AbstractArray hashing #39950
Conversation
…stractArray`: ``` julia> using StaticArrays, BenchmarkTools julia> a = @svector [1,2,3,4,5]; julia> @Btime hash($a, UInt(0)) 77.935 ns (0 allocations: 0 bytes) 0xdeb6d0657a261f74 julia> @Btime hash(AbstractArray, UInt(0)) 58.643 ns (0 allocations: 0 bytes) 0xc03f1dbe32103a9e ``` This replaces the hash of the objectid with a static randomly-generated number. Now: ``` julia> @Btime hash($a, UInt(0)) 18.580 ns (0 allocations: 0 bytes) 0x5e77b8bf73067ebd ``` and for a random `Float64` vector ``` julia> @Btime hash($a, UInt(0)) 29.031 ns (0 allocations: 0 bytes) 0x9a574d69612587eb ```
No, I actually didn't realize that |
Actually I might be wrong about that, I assumed it came from the pointer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 ! :)
Not sure what's happening, but it seems to be preserved within a given precompile/sysimg. |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
While we're speeding this up, would it make sense to change the behavior of hashing small arrays (<25 elements say) to hash all elements rather than |
If that's viable I'd be fine with it, but it's a different kind of change than the one here and should be a separate PR. |
Previously, the `object_id` lookup for `hash(AbstractArray, h)` dominated the hashing time for `AbstractArray`: ``` julia> using StaticArrays, BenchmarkTools julia> a = @svector [1,2,3,4,5]; julia> @Btime hash($a, UInt(0)) 77.935 ns (0 allocations: 0 bytes) 0xdeb6d0657a261f74 julia> @Btime hash(AbstractArray, UInt(0)) 58.643 ns (0 allocations: 0 bytes) 0xc03f1dbe32103a9e ``` This replaces the hash of the objectid with a static randomly-generated number. Now: ``` julia> @Btime hash($a, UInt(0)) 18.580 ns (0 allocations: 0 bytes) 0x5e77b8bf73067ebd ``` and for a random `Float64` vector ``` julia> @Btime hash($a, UInt(0)) 29.031 ns (0 allocations: 0 bytes) 0x9a574d69612587eb ``` Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Previously, the `object_id` lookup for `hash(AbstractArray, h)` dominated the hashing time for `AbstractArray`: ``` julia> using StaticArrays, BenchmarkTools julia> a = @svector [1,2,3,4,5]; julia> @Btime hash($a, UInt(0)) 77.935 ns (0 allocations: 0 bytes) 0xdeb6d0657a261f74 julia> @Btime hash(AbstractArray, UInt(0)) 58.643 ns (0 allocations: 0 bytes) 0xc03f1dbe32103a9e ``` This replaces the hash of the objectid with a static randomly-generated number. Now: ``` julia> @Btime hash($a, UInt(0)) 18.580 ns (0 allocations: 0 bytes) 0x5e77b8bf73067ebd ``` and for a random `Float64` vector ``` julia> @Btime hash($a, UInt(0)) 29.031 ns (0 allocations: 0 bytes) 0x9a574d69612587eb ``` Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Currently, hashing AbstractArrays begins with hashing the type itself:
hash(AbstractArray, h)
This ends up using the
object_id
fallback, and it turns out to dominate the hashing time for smallAbstractArray
s:This replaces the hash of the objectid with a static randomly-generated
number. Now:
and for a random
Float64
vectorHowever, I'm unsure whether in #26022 this design was deliberate in making the hashing vary across sessions. If so, perhaps we could generate a random number on Julia startup? I'd be grateful for feedback from @mbauman.