Chainrule for CUDA reduction #666

renatobellotti · 2022-08-20T05:38:33Z

Hi,

I'd like to suggest including a rule for GPU reductions.

using Zygote

function my_loss(v)
    # This works:
    # l = sum(v)
    # This does not work:
    l = reduce(+, v)
    return l
end

v = cu([1., 2.])
Zygote.gradient(my_loss, v)

See also: FluxML/Zygote.jl#730 (comment)

mcabbott · 2022-08-20T10:06:07Z

rrule(reduce, +, x; kw...) can just call rrule(sum, x; kw...) right?

renatobellotti · 2022-08-20T11:23:16Z

Isn't the reduction implemented on the GPU? I don't know the details, but reducing on the GPU and then copying the result is certainly more efficient than copying the entire vector and reducing on the CPU.

mcabbott · 2022-08-20T17:50:57Z

Sure. The rrule for sum just calls sum again on what it's given, for the forward pass, and thus uses the same GPU code as without AD. (And the reverse pass is written using broadcasting, which also works on the GPU.)

renatobellotti · 2022-08-22T11:17:02Z

Nice!

mcabbott added enhancement New feature or request good first issue Good for newcomers labels Aug 20, 2022

mcabbott added missing rule and removed enhancement New feature or request labels Aug 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chainrule for CUDA reduction #666

Chainrule for CUDA reduction #666

renatobellotti commented Aug 20, 2022

mcabbott commented Aug 20, 2022

renatobellotti commented Aug 20, 2022 •

edited

Loading

mcabbott commented Aug 20, 2022

renatobellotti commented Aug 22, 2022

Chainrule for CUDA reduction #666

Chainrule for CUDA reduction #666

Comments

renatobellotti commented Aug 20, 2022

mcabbott commented Aug 20, 2022

renatobellotti commented Aug 20, 2022 • edited Loading

mcabbott commented Aug 20, 2022

renatobellotti commented Aug 22, 2022

renatobellotti commented Aug 20, 2022 •

edited

Loading