Skip to content

Remove bugged and typically slower minimum/maximum method #58267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

mbauman
Copy link
Member

@mbauman mbauman commented Apr 29, 2025

This method intends to be a SIMD-able optimization for reductions with min and max, but it fails to achieve those goals on nearly every architecture. For example, on my Mac M1 the generic implementation is more than 6x faster than this method:

julia> A = rand(Float64, 10000);

julia> @btime reduce(max, $A);
  4.673 μs (0 allocations: 0 bytes)

julia> @btime reduce((x,y)->max(x,y), $A);
  718.441 ns (0 allocations: 0 bytes)

I asked for some crowd-sourced multi-architecture benchmarks for the above test case on Slack, and only on AMD Ryzen (znver2) and an old Intel i5 Haswell chip did this method help — and then it was only by about 20%.

Worse, this method is bugged for signed zeros; if the "wrong" signed zero is produced, then it re-runs the entire reduction (but without calling f) to see if the "right" zero should be returned.

Fixes #45932, fixes #36412, fixes #36081.

…aximum`

This method intends to be a SIMD-able optimization for reductions with `min` and `max`, but it fails to achieve those goals on nearly every architecture.  For example, on my Mac M1 the generic implementation is more than **6x** faster than this method:

```
julia> A = rand(Float64, 10000);

julia> @Btime reduce(max, $A);
  4.673 μs (0 allocations: 0 bytes)

julia> @Btime reduce((x,y)->max(x,y), $A);
  718.441 ns (0 allocations: 0 bytes)
```

I asked for some crowd-sourced multi-architecture benchmarks for the above test case on Slack, and only on AMD Ryzen (znver2) and an old Intel i5 Haswell chip did this method help — and then it was only by about 20%.

Worse, this method is bugged for signed zeros; if the "wrong" signed zero is produced, then it _re-runs_ the entire reduction (but without calling `f`) to see if the "right" zero should be returned.
@mbauman mbauman added performance Must go faster bugfix This change fixes an existing bug fold sum, maximum, reduce, foldl, etc. labels Apr 29, 2025
@mbauman mbauman changed the title Remove bugged and typically slower implementation of minimum and `m… Remove bugged and typically slower minimum/maximum method Apr 29, 2025
@giordano
Copy link
Contributor

Worse, this method is bugged for signed zeros; if the "wrong" signed zero is produced, then it re-runs the entire reduction (but without calling f) to see if the "right" zero should be returned.

Sounds like that'd make a good test (if not there already)?

@mikmoore
Copy link
Contributor

Looks like this eliminates some of the anti-performance specializations cited in #45581. Thanks!

@mbauman
Copy link
Member Author

mbauman commented Apr 29, 2025

Yeah, looks like this is better than the naive loop #36412 suggested, too:

julia> function f_minimum(A::Array{<:Integer})
           result = A[1]
           @inbounds for i=2:length(A)
               if A[i] < result
                   result = A[i]
               end
           end
           return result
       end
f_minimum (generic function with 1 method)

julia> x = rand(Int, 10_000);

julia> using BenchmarkTools

julia> @btime minimum($x)
  3.057 μs (0 allocations: 0 bytes)
-9222765095448017038

julia> @btime f_minimum($x)
  1.613 μs (0 allocations: 0 bytes)
-9222765095448017038

julia> @btime reduce((x,y)->min(x,y), $x)
  1.450 μs (0 allocations: 0 bytes)
-9222765095448017038

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This change fixes an existing bug fold sum, maximum, reduce, foldl, etc. performance Must go faster
Projects
None yet
3 participants