Skip to content

make Chunk(::StaticArray) type stable by default #707

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

ExpandingMan
Copy link

@ExpandingMan ExpandingMan commented Aug 14, 2024

The function Chunk(x) is manifestly type unstable. In many cases when the value is not directly used, the compiler is clever enough to elide this, but in some cases its mere presence can cause significant performance penalties.

In particular, I encountered this when trying to get DifferentiationInterface.jl to behave nicely with static arrays. That package calls GradientConfig (or JacobianConfig, etc) and for whatever reason the compiler gets upset about the calls to Chunk and the whole thing is wildly inefficient.

This PR attempts to address that issue by defining a StaticArray method for Chunk and setting the size to the arrays length. It is not entirely clear to me that this is the desired behavior, but dualize already seems to do this. Also this ignores the global chunk size limit setting, but again, dualize appears to already do this.

Note that I have explicitly confirmed that the @inferred tests I have added fail without this PR but succeed with it.

(btw I am getting a failure in one of the tests for allocations during seed! when I run on my end, but it looks totally unrelated to this so I'm not sure what's going on there. Maybe it'll pass in CI/CD, who knows)

@mcabbott
Copy link
Member

As you say chunk size = length does seem to be the present behaviour, ForwardDiff.gradient(x -> (@show x[1]), @SArray rand(1,100)).

If you explicitly make a GradientConfig it seems that obeys DEFAULT_CHUNK_THRESHOLD = 12, but this is ignored once you pass it to gradient? That's odd:

julia> f1(x) = @show x[1];

julia> ForwardDiff.Chunk(@SMatrix rand(1,100))
ForwardDiff.Chunk{12}()

julia> ForwardDiff.GradientConfig(f1, @SMatrix rand(1,100)).seeds
(Partials(1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0), Partials(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0))

julia> ForwardDiff.gradient(f1, @SArray(rand(1,100)), ForwardDiff.GradientConfig(f1, @SArray rand(1,100)))
x[1] = Dual{ForwardDiff.Tag{typeof(f1), Float64}}(0.6424264074844214,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0)
1×100 SMatrix{1, 100, Float64, 100} with indices SOneTo(1)×SOneTo(100):
 1.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0    0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

Using chunk size 100 might be sub-optimal. I would guess that whatever compiler limitations StaticArrays runs into for long arrays likely apply to long tuples of partials too, so this thing with 10^4 numbers might be awful? Haven't timed it and perhaps picking a better heuristic isn't this PR's problem anyway.

@ExpandingMan
Copy link
Author

I have attempted an alternate scheme based partly on @avik-pal 's suggestion. It seems to work when I test it locally, though it is certainly rather precarious and scary looking as it assumes inference can follow a bunch of logic without breaking. I think what I have here now would be considered non-breaking though.

For whatever it's worth, were I to do this from scratch, I'd probably split Chunk into two different objects. The first would be a mere struct Chunk size::Int; end which would be used for ordinary arrays and the latter would be like the current Chunk{N} which could be used with static arrays or when a user desires. In the standard array case it does not seem justified to think the type Chunk{N} should be inferrable so I'm not sure what business it has having a type parameter.

Copy link

codecov bot commented Aug 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.59%. Comparing base (c310fb5) to head (a4b3e4e).
Report is 7 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #707      +/-   ##
==========================================
+ Coverage   89.57%   89.59%   +0.01%     
==========================================
  Files          11       11              
  Lines         969      980      +11     
==========================================
+ Hits          868      878      +10     
- Misses        101      102       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

# Constrained to `N <= threshold`, minimize (in order of priority):
# 1. the number of chunks that need to be computed
# 2. the number of "left over" perturbations in the final chunk
function pickchunksize(input_length, threshold = DEFAULT_CHUNK_THRESHOLD)
function pickchunksize(::Val{input_length}, ::Val{threshold} = DEFAULT_CHUNK_THRESHOLD) where {input_length, threshold}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used in downstream packages, e.g., in SciML: https://github.com/search?q=org%3ASciML+pickchunksize&type=code So even though technically it might not breaking, it will affect many downstream packages. Similarly, Chunk is used by quite a few downstream packages, e.g., SciML or also LogDensityProblemsAD (and hence e.g. Turing indirectly).

Since the next release will be breaking though anyway, maybe that's not a major issue. I think it would be good to keep the API simple though, ie., to not support both integers and Vals.

@KristofferC
Copy link
Collaborator

KristofferC commented Aug 15, 2024

For me, the following diff is enough:

-const CHUNKS = [Chunk{i}() for i in 1:DEFAULT_CHUNK_THRESHOLD]
- 
 function Chunk(input_length::Integer, threshold::Integer = DEFAULT_CHUNK_THRESHOLD)
     N = pickchunksize(input_length, threshold)
-    0 < N <= DEFAULT_CHUNK_THRESHOLD && return CHUNKS[N]
+    0 < N <= DEFAULT_CHUNK_THRESHOLD && return Chunk{N}()
     return Chunk{N}()
 end
julia> @inferred ForwardDiff.GradientConfig(f, sx);

The reason the const CHUNKS is there is historical where Julia was bad at optimizing a certain thing (JuliaLang/julia#29887) but I don't think it should be needed now.

This should be benchmarked again with this change though: #373

KristofferC pushed a commit that referenced this pull request Aug 15, 2024
This prevents constant propagation of the length of a static array. Alternative to #707
@KristofferC
Copy link
Collaborator

Made an alternative PR at #708

@gdalle
Copy link
Member

gdalle commented Aug 15, 2024

FYI @ExpandingMan the idea of DifferentiationInterface is to prepare the operator once and then reuse the preparation several times. For gradient and Jacobian, the preparation phase does involve a type-unstable code to Chunk, but then the resulting extras object allows for type-stable application of the operator. In this spirit, it's okay for preparation to be type unstable (and currently it has to be with ForwardDiff), but the code you put in hot loops should be the prepared version of the operator.
See the tutorial at https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface/stable/tutorial1/#Preparing-for-multiple-gradients for an example.

@ExpandingMan
Copy link
Author

Thanks @KristofferC , I have confirmed that #708 works, so that seems obviously preferable to this, I will close this.

KristofferC added a commit that referenced this pull request Aug 19, 2024

This prevents constant propagation of the length of a static array. Alternative to #707

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
devmotion added a commit that referenced this pull request Aug 19, 2024

This prevents constant propagation of the length of a static array. Alternative to #707

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
devmotion added a commit that referenced this pull request Sep 30, 2024

This prevents constant propagation of the length of a static array. Alternative to #707

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
devmotion added a commit that referenced this pull request Sep 30, 2024

This prevents constant propagation of the length of a static array. Alternative to #707

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
devmotion added a commit that referenced this pull request Sep 30, 2024

This prevents constant propagation of the length of a static array. Alternative to #707

Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants