-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recurrent Attention API for keras #11172
Comments
|
From @farizrahman4u in #11142.
|
Great to see this moving 👍I'm happy cont. the work as well. This was the result of my "requirements analysis" (based on common use-case/authoritative papers) as of the API doc:
These should be useful as a reference when discussing tradeoffs and priorities. |
Are we talking about Attention only in recurrent models? |
Should we consider only the more common dot-product attention or everything else with an overridable method that computes weights? |
I think it is important to not be limited to rnn design. I.e. See https://openreview.net/forum?id=HyGBdo0qFm |
The latter is what's achieved by this method of the suggested base class for recurrent attention mechanisms, as explained here. |
Generally, I agree. The different approach taken in e.g. Attention Is All You Need is of great importance and should be considered for any seq2seq problem. However (!) I'd claim that any feedforward architecture can already be supported by the existing Keras API. You might have to write several custom layers but feedforward attention can be implemented by reusing, and without duplicating, major parts of the existing API (along the lines of @farizrahman4u commet cited above) Recurrent attention is different; if you want to implement it in a new layer/model you need to reimplement the majority of the RNN logic or wrap the cell as is suggested in the previous PR. That said, it might still make sense to add layers or models to the API for non-recurrent attention. But I think it is still high prio to support recurrent attention as was initially in the "request for contributions". |
In a sense, I think recurrent attention "is also supported" since we added support for constants in the RNN - you can quite easily write your own cell wrapper. To me, It's just a question of how standardized and simple you want to make this and if/what ready (cell-wrapper) attention mechanisms should be added to the API. The main current limitation with this approach is that there is no option to return "state sequences" from the RNN, which is required to feed the attention encoding from on layer to subsequent layers (see point 3) here). |
Ok. So usage workflow examples have been requested. Since the heading of this and all preceding issues/PRs has been recurrent attention I'll focus on this and repeat/clarify the workflow of the only concrete suggestion so far. I'll use the architecture in this paper for handwriting synthesis as the use-case (but the workflow would be the same for e.g. this kind image captioning)
I think that this workflow is perfectly aligned with Keras guiding principles. Note that no modification of existing classes is required, we've just defined a new I honestly can't come up with a reasonable alternative based on the Model subclassing API for this use case. I guess these are other options:
Here, Alternatively something like:
Where both the core cell and and attention mechanism are injected into an new class that connects them. But this requires new interfaces both for the Bottom Line
|
|
I have a feeling that some uncertainty comes from the (motivated!) fuzz about non-recurrent attention mechanisms. If we were to add support for the transformer architecture in Attention is All you Need (or the recent BERT), I definitely think that Model subclassing API is a good place to start - because there are many intricate parts that should be combined the right way. It would look something like:
Where
But we should probably avoid considering this for now and make as few additions as possible. This is why it was decided to not add the |
Thanks @farizrahman4u, makes sense. So you think something like the |
I think this version of machine translation (Bengio 2016) would serve as a good end-to-end example. Keras implementation of the paper would look like this:
Where Sounds good? @farizrahman4u @fchollet If so I'll implement the attention mechanism and end-to-end example. |
Fair enough. I think you can write the whole thing in the example (including the RNNAttentionCell class), submit a PRs, and discuss with @fchollet on what parts can moved into Keras API and what should stay in the example. |
As per @farizrahman4u suggestion above, please see #11421 @fchollet |
Attention for Dense Networks on Keras RFC: |
Is there an update on this? The RFC has been approved for over a year |
This issue is opened to host a discussion about the recurrent attention API for keras.
Related issues:
#11142.
#8296.
#7633.
The text was updated successfully, but these errors were encountered: