Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding non-mutating recur for the new chain interface. #7

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mkschleg
Copy link
Contributor

This adds the necessary implementation details for recurrent networks for the new chain api (#5). Sorry for the noise with #6 , but there was an issue with some merge conflicts that I thought I would resolve outside of the PR.

The tests indicate that this might solve some of the tests for explicit gradients. The gradient for state_0 still seems broken (returns nothing).

How this is done was adapted from Lux.jl.

PR Checklist

  • Tests are added
  • Documentation, if applicable

@mkschleg
Copy link
Contributor Author

Test error looks to be something up with Zygote on master.

@ToucheSir
Copy link
Member

I might've missed this from the apply PR, but why does NM_Recur implement _apply instead of apply? My impression was that apply would be the user-facing API for new layers to overload.

@mkschleg
Copy link
Contributor Author

mkschleg commented Apr 2, 2023

There might have been some miscommunication on the previous apply api. I thought we wanted to separate the exported apply for chains and the internal _apply for layers. We should be able to unify though.

@ToucheSir
Copy link
Member

That may have been my fault. My thought was to use apply wherever possible, so unification would be great.

@mkschleg
Copy link
Contributor Author

mkschleg commented Apr 3, 2023

I've (mostly) unified apply. There is an issue with ambiguity between the single timestep apply and the time series apply over a vector/generator. I solved this by making the tuple calls _apply in chain.jl and adding another method for Flux.Chain. I don't know how sustainable that is for applying to Flux at large, but I think it might be ok because most layers shouldn't need a custom apply.

For the tuples (i.e. the chain.layers) specific applies we should use _apply. As I don't think beyond use in Flux.Chain a tuple of layers has a well defined use case (i.e. that is why parallel, join, etc exist).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants