Skip to content

Conversation

@timholy
Copy link
Member

@timholy timholy commented Mar 8, 2021

Currently, hashing AbstractArrays begins with hashing the type itself:

hash(AbstractArray, h)

This ends up using the object_id fallback, and it turns out to dominate the hashing time for small AbstractArrays:

julia> using StaticArrays, BenchmarkTools

julia> a = @SVector [1,2,3,4,5];

julia> @btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e

This replaces the hash of the objectid with a static randomly-generated
number. Now:

julia> @btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd

and for a random Float64 vector

julia> @btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb

However, I'm unsure whether in #26022 this design was deliberate in making the hashing vary across sessions. If so, perhaps we could generate a random number on Julia startup? I'd be grateful for feedback from @mbauman.

…stractArray`:

```
julia> using StaticArrays, BenchmarkTools

julia> a = @svector [1,2,3,4,5];

julia> @Btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @Btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e
```

This replaces the hash of the objectid with a static randomly-generated
number. Now:
```
julia> @Btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd
```

and for a random `Float64` vector

```
julia> @Btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb
```
@timholy timholy requested a review from mbauman March 8, 2021 11:01
@mbauman
Copy link
Member

mbauman commented Mar 8, 2021

No, I actually didn't realize that hash(AbstractArray) isn't stable across sessions. IIRC I just thought it was a cute way of avoiding the need to come up with that random number (and 32-bit truncation).

@timholy
Copy link
Member Author

timholy commented Mar 8, 2021

Actually I might be wrong about that, I assumed it came from the pointer.

Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ! :)

@mbauman
Copy link
Member

mbauman commented Mar 8, 2021

Actually I might be wrong about that, I assumed it came from the pointer.

Not sure what's happening, but it seems to be preserved within a given precompile/sysimg.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@oscardssmith
Copy link
Member

While we're speeding this up, would it make sense to change the behavior of hashing small arrays (<25 elements say) to hash all elements rather than log(n)? I'd imagine that this would likely be faster (at least for Array)

@timholy timholy merged commit faa3d41 into master Mar 9, 2021
@timholy timholy deleted the teh/faster_aahash branch March 9, 2021 06:37
@timholy
Copy link
Member Author

timholy commented Mar 9, 2021

If that's viable I'd be fine with it, but it's a different kind of change than the one here and should be a separate PR.

ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021
Previously, the `object_id` lookup for `hash(AbstractArray, h)` dominated the hashing time for `AbstractArray`:
```
julia> using StaticArrays, BenchmarkTools

julia> a = @svector [1,2,3,4,5];

julia> @Btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @Btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e
```

This replaces the hash of the objectid with a static randomly-generated
number. Now:
```
julia> @Btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd
```

and for a random `Float64` vector

```
julia> @Btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb
```

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request May 9, 2021
Previously, the `object_id` lookup for `hash(AbstractArray, h)` dominated the hashing time for `AbstractArray`:
```
julia> using StaticArrays, BenchmarkTools

julia> a = @svector [1,2,3,4,5];

julia> @Btime hash($a, UInt(0))
  77.935 ns (0 allocations: 0 bytes)
0xdeb6d0657a261f74

julia> @Btime hash(AbstractArray, UInt(0))
  58.643 ns (0 allocations: 0 bytes)
0xc03f1dbe32103a9e
```

This replaces the hash of the objectid with a static randomly-generated
number. Now:
```
julia> @Btime hash($a, UInt(0))
  18.580 ns (0 allocations: 0 bytes)
0x5e77b8bf73067ebd
```

and for a random `Float64` vector

```
julia> @Btime hash($a, UInt(0))
  29.031 ns (0 allocations: 0 bytes)
0x9a574d69612587eb
```

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@nsajko nsajko added performance Must go faster arrays [a, r, r, a, y, s] hashing labels Mar 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrays [a, r, r, a, y, s] hashing performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants