Skip to content

Faster AbstractArray hashing #39950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 9, 2021
Merged

Faster AbstractArray hashing #39950

merged 2 commits into from
Mar 9, 2021

Conversation

timholy
Copy link
Member

@timholy timholy commented Mar 8, 2021

Currently, hashing AbstractArrays begins with hashing the type itself:

hash(AbstractArray, h)

This ends up using the object_id fallback, and it turns out to dominate the hashing time for small AbstractArrays:

julia> using StaticArrays, BenchmarkTools

julia> a = @SVector [1,2,3,4,5];

julia> @btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e

This replaces the hash of the objectid with a static randomly-generated
number. Now:

julia> @btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd

and for a random Float64 vector

julia> @btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb

However, I'm unsure whether in #26022 this design was deliberate in making the hashing vary across sessions. If so, perhaps we could generate a random number on Julia startup? I'd be grateful for feedback from @mbauman.

…stractArray`:

```
julia> using StaticArrays, BenchmarkTools

julia> a = @svector [1,2,3,4,5];

julia> @Btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @Btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e
```

This replaces the hash of the objectid with a static randomly-generated
number. Now:
```
julia> @Btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd
```

and for a random `Float64` vector

```
julia> @Btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb
```
@timholy timholy requested a review from mbauman March 8, 2021 11:01
@mbauman
Copy link
Member

mbauman commented Mar 8, 2021

No, I actually didn't realize that hash(AbstractArray) isn't stable across sessions. IIRC I just thought it was a cute way of avoiding the need to come up with that random number (and 32-bit truncation).

@timholy
Copy link
Member Author

timholy commented Mar 8, 2021

Actually I might be wrong about that, I assumed it came from the pointer.

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ! :)

@mbauman
Copy link
Member

mbauman commented Mar 8, 2021

Actually I might be wrong about that, I assumed it came from the pointer.

Not sure what's happening, but it seems to be preserved within a given precompile/sysimg.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@oscardssmith
Copy link
Member

While we're speeding this up, would it make sense to change the behavior of hashing small arrays (<25 elements say) to hash all elements rather than log(n)? I'd imagine that this would likely be faster (at least for Array)

@timholy timholy merged commit faa3d41 into master Mar 9, 2021
@timholy timholy deleted the teh/faster_aahash branch March 9, 2021 06:37
@timholy
Copy link
Member Author

timholy commented Mar 9, 2021

If that's viable I'd be fine with it, but it's a different kind of change than the one here and should be a separate PR.

ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
Previously, the `object_id` lookup for `hash(AbstractArray, h)` dominated the hashing time for `AbstractArray`:
```
julia> using StaticArrays, BenchmarkTools

julia> a = @svector [1,2,3,4,5];

julia> @Btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @Btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e
```

This replaces the hash of the objectid with a static randomly-generated
number. Now:
```
julia> @Btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd
```

and for a random `Float64` vector

```
julia> @Btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb
```

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request May 9, 2021
Previously, the `object_id` lookup for `hash(AbstractArray, h)` dominated the hashing time for `AbstractArray`:
```
julia> using StaticArrays, BenchmarkTools

julia> a = @svector [1,2,3,4,5];

julia> @Btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @Btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e
```

This replaces the hash of the objectid with a static randomly-generated
number. Now:
```
julia> @Btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd
```

and for a random `Float64` vector

```
julia> @Btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb
```

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@nsajko nsajko added performance Must go faster arrays [a, r, r, a, y, s] hashing labels Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] hashing performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants