Skip to content

Include dedicated derivative functions for FiniteDifferences/ForwardDiff instead of relying on jacobians? #87

Closed
@arthur-bizzi

Description

@arthur-bizzi

Hey all.

As it stands, calling AD.derivative for FiniteDifferences and ForwardDiff back-ends first calculates the jacobian and then flattens it into the derivative. For a few edge cases, say a single-input function, this is significantly slower:

using FiniteDifferences, BenchmarkTools
import AbstractDifferentiation as AD

fdm = central_fdm(2,1,adapt=0)

fd = AD.FiniteDifferencesBackend(fdm)

with_AD(x) = AD.derivative(fd,sin,x)
without_AD(x) = fdm(sin,x)
blame_the_jacobian(x) = jacobian(fdm,sin,x)

@benchmark with_AD(1.)
@benchmark without_AD(1.)
@benchmark blame_the_jacobian(1.)

BenchmarkTools.Trial: 10000 samples with 10 evaluations.
 Range (min … max):  1.070 μs … 552.240 μs  ┊ GC (min … max): 0.00% … 99.30%
 Time  (median):     1.160 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.327 μs ±   5.524 μs  ┊ GC (mean ± σ):  4.13% ±  0.99%

    ▅█
  ▂███▆▄▃▃▃▂▂▂▂▃▂▂▂▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  1.07 μs         Histogram: frequency by time        2.42 μs <

 Memory estimate: 944 bytes, allocs estimate: 17.

BenchmarkTools.Trial: 10000 samples with 961 evaluations.
 Range (min … max):  85.640 ns …  2.145 μs  ┊ GC (min … max): 0.00% … 93.60%
 Time  (median):     88.658 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   97.445 ns ± 46.875 ns  ┊ GC (mean ± σ):  0.98% ±  2.08%

  ▂█▇    ▁▄▅▄▃    ▄     ▁▁▁▁                                  ▁
  ███▇██▆██████▆▆███▄▅▆▇█████▇▆▅▃▄▅▅▄▅▆▆▅▆▇▇█▇▃▄▄▂▅▄▅▄▅▃▄▄▅▃▄ █
  85.6 ns      Histogram: log(frequency) by time       173 ns <

 Memory estimate: 32 bytes, allocs estimate: 2.

BenchmarkTools.Trial: 10000 samples with 111 evaluations.
 Range (min … max):  774.775 ns … 47.623 μs  ┊ GC (min … max): 0.00% … 97.59%
 Time  (median):     819.820 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   950.669 ns ±  1.825 μs  ┊ GC (mean ± σ):  7.82% ±  4.01%

  ▄▇█▆▄▂▂▁▁▁▁  ▃▄    ▁▁                                        ▁
  ██████████████████████▇▆▆▆▆▆▆▆▆▅▆▅▅▆▆▅▆▄▅▅▅▅▄▄▄▄▆▅▄▆▄▄▄▅▅▄▄▅ █
  775 ns        Histogram: log(frequency) by time      1.83 μs <

 Memory estimate: 864 bytes, allocs estimate: 14.

This is also the case for other, less silly examples like small neural networks with a single input. What are the reasons for not implementing the derivative directly? Something along the lines of:

function AD.derivative(ba::AD.FiniteDifferencesBackend, f, xs...)
    return (ba.method(f, xs...),)
end

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions