Adding non-mutating recur for the new chain interface. #7

mkschleg · 2023-03-30T18:08:57Z

This adds the necessary implementation details for recurrent networks for the new chain api (#5). Sorry for the noise with #6 , but there was an issue with some merge conflicts that I thought I would resolve outside of the PR.

The tests indicate that this might solve some of the tests for explicit gradients. The gradient for state_0 still seems broken (returns nothing).

How this is done was adapted from Lux.jl.

PR Checklist

Tests are added
Documentation, if applicable

mkschleg · 2023-03-30T18:55:24Z

Test error looks to be something up with Zygote on master.

ToucheSir · 2023-04-01T03:42:10Z

I might've missed this from the apply PR, but why does NM_Recur implement _apply instead of apply? My impression was that apply would be the user-facing API for new layers to overload.

mkschleg · 2023-04-02T17:17:17Z

There might have been some miscommunication on the previous apply api. I thought we wanted to separate the exported apply for chains and the internal _apply for layers. We should be able to unify though.

ToucheSir · 2023-04-02T17:22:57Z

That may have been my fault. My thought was to use apply wherever possible, so unification would be great.

mkschleg · 2023-04-03T16:52:17Z

I've (mostly) unified apply. There is an issue with ambiguity between the single timestep apply and the time series apply over a vector/generator. I solved this by making the tuple calls _apply in chain.jl and adding another method for Flux.Chain. I don't know how sustainable that is for applying to Flux at large, but I think it might be ok because most layers shouldn't need a custom apply.

For the tuples (i.e. the chain.layers) specific applies we should use _apply. As I don't think beyond use in Flux.Chain a tuple of layers has a well defined use case (i.e. that is why parallel, join, etc exist).

Adding non-mutating recur for the new chain interface.

62c2ed2

Unify apply, fix 3d arrays

bd00e63

mkschleg added 2 commits April 5, 2023 09:52

Added reset to explicit gradient test, fixed broken test.

e077dc4

Remove broken test for <v1.7.

bc1bc41

mkschleg mentioned this pull request May 19, 2023

Make RNNs blocked (and maybe fixing gradients along the way) FluxML/Flux.jl#2258

Open

mkschleg mentioned this pull request Jun 26, 2023

NewRecur experimental interface #11

Merged

1 task

ToucheSir mentioned this pull request Aug 28, 2023

Add NewRecur for Blocked RNNs FluxML/Flux.jl#2316

Open

3 tasks

ToucheSir mentioned this pull request Feb 12, 2024

Roadmap FluxML/Flux.jl#1829

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding non-mutating recur for the new chain interface. #7

Adding non-mutating recur for the new chain interface. #7

mkschleg commented Mar 30, 2023

mkschleg commented Mar 30, 2023

ToucheSir commented Apr 1, 2023

mkschleg commented Apr 2, 2023

ToucheSir commented Apr 2, 2023

mkschleg commented Apr 3, 2023 •

edited

Loading

Adding non-mutating recur for the new chain interface. #7

Are you sure you want to change the base?

Adding non-mutating recur for the new chain interface. #7

Conversation

mkschleg commented Mar 30, 2023

PR Checklist

mkschleg commented Mar 30, 2023

ToucheSir commented Apr 1, 2023

mkschleg commented Apr 2, 2023

ToucheSir commented Apr 2, 2023

mkschleg commented Apr 3, 2023 • edited Loading

mkschleg commented Apr 3, 2023 •

edited

Loading