Skip to content

Conversation

@vtjnash
Copy link
Member

@vtjnash vtjnash commented Oct 14, 2020

In LLVM (inherited from C), fptosi has undefined behavior if the result
does not fit the integer size after rounding down. But by using the same
strategy as generic hashing of Real values, we actually can end up with
a situation that is not much slower for the CPU to deal with and avoids
the UB.

Refs #6624 (3696968)
Fixes #37800

Benchmarks:

julia> Random.seed!(1); a = rand(Int64, 10^8); b = copy(reinterpret(Float64, a)); c = rand(Float64, length(a)); d = copy(reinterpret(UInt64, a));
julia> @time sum(hash, a); @time sum(hash, b); @time sum(hash, c); @time sum(hash, d);
master:
  0.174493 seconds (1 allocation: 16 bytes)
  0.316409 seconds (1 allocation: 16 bytes)
  0.317264 seconds (1 allocation: 16 bytes)
  0.145938 seconds (1 allocation: 16 bytes)

PR:
  0.119815 seconds (1 allocation: 16 bytes)
  0.687834 seconds (1 allocation: 16 bytes)
  0.468544 seconds (1 allocation: 16 bytes)
  0.210141 seconds (1 allocation: 16 bytes)

Aside: note that integers-as-floating-point aren't affected as much:

julia> e = map(Float64, a);
master:  0.313309 seconds (1 allocation: 16 bytes)
PR:   0.338984 seconds (1 allocation: 16 bytes)

@StefanKarpinski I picked these factors mostly at random. So if you have better ideas for the specific hashing function itself, I'll update to that. This is showing the general idea and testing performance of it.

@jmkuhn
Copy link
Contributor

jmkuhn commented Oct 14, 2020

With this PR you have

julia> hash(0.0) == hash(-0.0)
true

@vtjnash
Copy link
Member Author

vtjnash commented Oct 15, 2020

Ah, you're right. Good catch. I completely botched that and forgot to use isequal with hashing instead of ==.

@vtjnash vtjnash added the triage This should be discussed on a triage call label Oct 19, 2020
@vtjnash
Copy link
Member Author

vtjnash commented Oct 22, 2020

triage concluded to simplify this (disable the breaking tests), and otherwise to proceed

@vtjnash vtjnash removed the triage This should be discussed on a triage call label Oct 22, 2020
In LLVM (inherited from C), fptosi has undefined behavior if the result
does not fit the integer size after rounding down. But by using the same
strategy as generic hashing of Real values, we actually can end up with
a sitatuion that is faster for the CPU to deal with and avoids the UB.

Refs #6624 (3696968)
Fixes #37800
@Keno
Copy link
Member

Keno commented Oct 29, 2020

What is holding this up? IIRC Triage ok'ed this, so can we just merge it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Floating point hash bug

5 participants