Recurrent Attention API: Support constants in RNN #7980

andhus · 2017-09-24T21:37:48Z

This PR is the first step to support recurrent attention mechanisms as suggested in #7633.

In short, it adds the possibility to pass constants (besides inputs and initial_state) when calling the RNN layer. The RNN will pass on the tensor(s) to the call method of the RNN Cell which (if constants are passed) is expected to implement the signature:
call(inputs, states, constants)

…them on to the cell

…ing constants

roya0045 · 2017-09-26T23:42:10Z

Any idea why it seems to fail with CNTK (according to Travis)?

fchollet · 2017-09-30T20:00:08Z

Can you explain why this would be necessary? I believe RNN cells that need specific constants should be managing their own constants, e.g. what we do for dropout (the parent RNN is agnostic to dropout constants).

I am pretty sure Attention wrappers would be implementable with the current API

andhus · 2017-10-01T07:44:43Z

The key here is that the constants (the “attended” when used for attention) really are inputs to the the recurrent layer (I refer to them as “external” constants in docs but maybe we need a better name). I think it would be bad and confusing to break the principle that all inputs to a layer should be passed to that layer in __call__, i.e. when it is applied to its inputs in the functional API.

The constant(s) needed for attention is fundamentally different from dropout, as the later is only a regularisation method and we don’t pass the dropout values in inference mode (they are only generated in in training) but the attended, of course, have to passed as input.

When it comes to implementation/API, it is certainly possible to implement attention in keras only using custom wrapper - I’ve done it several times and that was the solution in my original API suggestion, which I think is worse ;).

But let’s looks at some options...

This is the “look and feel” to the user of my current suggestion:

input_sequence = Input((timesteps, features))
attended_sequence = Input((attended_timesteps, attended_features))

attention_lstm = RNN(
    cell=SequenceAttention(LSTMCell(32)),
    return_sequences=True  # could of course be False in another use-case
)
lstm_output_sequence = attention_lstm(
    input_sequence,
    constants=attended_sequence
)

output_layer = TimeDistributed(Dense(n_labels, activation='softmax'))
output = output_layer(lstm_output_sequence)

model = Model(inputs=[input_sequence, attended_sequence], outputs=output)

I guess what your suggestion means is that you would only let the cell know about the attended, something like:

cell = SequenceAttention(LSTMCell(32), attended=attended_sequence)
attention_lstm = RNN(cell=cell, return_sequences=True)
lstm_output_sequence = attention_lstm(input_sequence)

Or maybe:

cell = SequenceAttention(LSTMCell(32))
cell.attended = attended_sequence
attention_lstm = RNN(cell=cell, return_sequences=True)
lstm_output_sequence = attention_lstm(input_sequence)

You can probably make this work but I'd like to avoid it for a number of reasons:

I don’t think there is any other situation in keras where one would inject an input tensor in the init of a layer - it feels intuitively wrong to have these implicit inputs (for which we, again, have to pass data for both in training an inference).
If not all inputs are passed in call, it breaks the possibility to easily reuse the layer in several places of the computational graph.
Typically you need to build the attention mechanism (initialise weights etc.) and I see no reason why this shouldn’t be done in the build method (triggered by __call__) where you get the input shapes of all inputs.

I think a good comparison is why we can (optionally) pass initial_state to the RNN. If we do, it should really be considered an input (it needs to be passed in inference) - otherwise it's an "internal thing" like dropout. Just like the initial state, the attended can (or will in most cases) be the output of a previous layer.

Another option would be to treat the attended as part of the initial_state but that's a hack I think and leads to several challenges.

Yet another (better) option is to add a new class AttentionRNN(RNN) which adds the functionality suggested in this PR without touching the existing RNN - but then you need to both wrap the cell with an attention mechanism and use another Parent RNN layer for using it. This would be fine I think but since the addition in RNN just adds an additional optional argument and is backward compatible I thought it would be better this way.

It is also, of course, possible that I'm missing some other option. Please question the above!

andhus · 2017-10-01T07:55:32Z

The fact that we in the current RNN implementation "temporarily" modify the input_spec when initial_state is passed (https://github.com/fchollet/keras/blob/master/keras/layers/recurrent.py#L453) is very related to this discussion. I've just extended the same behaviour for constants but I don't really like it. Just like @farizrahman4u points out in a comment on the second PR: #7981 it would be better to be explicit about all tensors that really are inputs in the input_spec

fchollet · 2017-10-01T22:45:29Z

Thanks for the detailed explanation! It's a reasonable choice. An alternative would be the following:

AttentionRNN subclass of RNN provides support for attention
attended_sequence keyword argument in the call of AttentionRNN
which gets added to the input list (just like we do for the initial_state argument)

y = AttentionRNN(cell)(x, attended_sequence=z)

What do you think?

In terms of PR process, I suggest doing a single PR for the whole thing. That will make the purpose of the various pieces more apparent.

andhus · 2017-10-02T04:16:29Z

Sounds great! Basically, the additions to RNN in this PR are kept separate in a new subclass AttentionRNN. I suspect this will introduce code duplication but I agree this is motivated by avoiding adding any additional complexity to the RNN.

A few details:

The new input keyword argument should not contain sequence as it could be just anything, e.g. an image in an "mage captioning problem". What really defines this argument is that it will be presented in "whole" at each cell transformation. How about just attended?. It's required shape is only defined by cell implementation, for attention-cells that implement attended_spec the AttentionRNN can expose this (including the number of dims already before application).
A reminder (that is relevant for the naming discussion) is that there is no reason to limit ourselves to a single attended input (let's say you want to train a model to summarise a news article and attend both to the text and an image e.g.). I think the "plural" is covered by attended but we could also go for attended_inputs.
Just to be clear, we'll need to add the keyword arg to AttentionRNN.__call__ as well, as we need to override parent RNN.__call__ to add the attended inputs to input_spec before build.

So in summary:

y = AttentionRNN(attention_cell)(x, attended=z)

# alt. for many attended:

y = AttentionRNN(attention_cell)(x, attended=[z1, z2])

Where the attention_cell implements call(inputs, states, attended)

Totally agree about the PRs, Let's close second one to keep discussion in one place!

fchollet · 2017-10-03T04:14:23Z

Cool, let's go with this then!

andhus · 2017-10-03T18:49:39Z

Great, I'll update this PR and the API doc this weekend.

…ention_api_unified

… outputs in wrapped model

andhus · 2017-10-08T20:58:00Z

keras/layers/recurrent.py

@@ -194,6 +196,130 @@ def get_losses_for(self, inputs=None):
        return losses


+class FunctionalRNNCell(Wrapper):


Is RNNCellModelbetter name?

No, that's not a good idea. It's not a Model from the aspect that it should not have compile, fit, predict etc... it's should just "package" the cell based on the functional definition.

We should probably override __call__ as well.

fchollet · 2017-10-09T01:41:04Z

This is a big PR. If any users of Attention or people familiar with Keras RNNs could help with reviewing, that would be greatly appreciated.

… attention test

andhus · 2017-10-09T07:18:20Z

Yes, would be great with input from more contributors/users. Here is a guide to the additions:

New class FunctionalRNNCell
Adds support for composing RNN cells using keras functional API.

Compared to the general Model it automatically implements the property state_size as well as the correct signature of call (required for RNN cells) based on the arguments passed.
It also offers a better API for keeping track of what is input/output and what is input states/output states.
Supports an additional input attended for implementing attention mechanisms.

New class AttentionRNN(RNN)
There is quite a lot of code here but it really just does a few things:

Asserts that the passed cell supports attention (cell.call takes kwarg attended)
Takes the kwarg attended (tensor/list of tensors) in both __call__ and call and passes it on to the cell.
Temporarily modifies input_spec in __call__ so that the attendedis properly added to the computational graph.
If the wrapped cell is a keras Layer it passes the shape of the attended as part of input_shape to cell.build so that the cell can build the attention mechanism properly.

In the AttentionRNN there is quite some code duplication from RNN(as expected). This can be alleviated by breaking apart the __call__ and callmethod of RNN but based on our previous discussions I intentionally left RNN untouched for now.

There are also many more tests required, but I want some feedback/review before continuing.

fchollet · 2017-10-09T21:41:13Z

New class AttentionRNN(RNN)

Could we move attention support to the base class RNN altogether? Conditional on whether the cell's call takes the attended argument.

Also: what would be some alternative names for attended? attended_sequence? I'm afraid attended wouldn't be sufficiently explicit.

andhus · 2017-10-10T08:01:28Z

Could we move attention support to the base class RNN altogether?

This is exactly what I implemented in a backward compatible way (and tested) in the original version of this PR (I then used the keyword argument constants): 28795f1

Also: what would be some alternative names for attended? attended_sequence?

As discussed above, "sequence" should not be in the name. Again, the attended can be anything (including multiple) tensors. A standard application is image captioning where the attended is an image (or typically CNN feature maps of an image).

I do still think it would be better to support passing, what I originally referred to i docs as, external constants in the base RNN, which forwards them to cell.call, as well as their shapes to cell.build if the cell is a keras layer. And yes, if such constants are passed in RNN.__call__ it is required that cell.call accepts them.

As per the current state of this PR, RNN is purely a special case of AttentionRNN (with attended=None) which is a bit strange. I agreed to your suggestion to put this logic in a new separate class with (as I wrote) the main motivation being "avoiding adding any additional complexity to the RNN" and perhaps to move forward a bit faster ;). There is value in minimising complexity of the base RNNclass, but the question is if this motivates a whole new class which is just a generalisation of the first.

Regarding naming: If we were to stick to an additional separate AttentionRNN, I think attended is the best name. As suggested previously, attended_inputs is also an alternative. However, if we move back support of "external constants" to RNN I think attended is a confusing and unnecessary limiting name. I can't come up with a better name than constants for this case, and make clear in documentation that "these are tensors that will be passed in whole to cell.call for each time step" as was done in original PR, see: 28795f1#diff-3118e4e28157032506f771f279a551c3R281

Generally, I prefer to consider RNN the component in the Keras Layer-abstraction level that corresponds to the Keras backend's K.rnn or e.g. theano.scan. There is a reason why both of these takes an argument for "external constants". In K.rnn the keyword constants is used and theano.scan uses non_sequences. In theano there are optimisation reasons for passing these (not utilised in keras backend!?) and it is not required to pass such constants to a "control flow op" if it suports "partialising" of the step function with preset tensors. We discussed this "partialisation" in our initial discussion and it is not an option on the Keras Layer-abstraction level as we need (want!) to be explicit about all inputs and outputs of layers.

The theano "non_sequences" is not a very user-friendly name as these "external constants" could be sequences but not iterated over by the RNN cell. The reason I don't want to go for attended as keyword in RNN is that they could be used for other things (you could e.g. write cell wrappers that implements recurrent dropout for any cell based on these constants...)

...I apologise for these "essay" comments. This is an intricate matter. I've really tried to do a thorough requirements analysis with respect to both how Attention is used in research and the current state of Keras (which e.g. lead to my conclusion that the RNNCell should really be broken out before this addition). Perhaps it was too early to move to PRs but on the other hand it is the only way to really show the implications and feasibility of the suggestion. It is quite slow to work with the sparse, and occasionally inconsistent ;), feedback. I'd be happy to set up a Hangouts session to discuss details and options directly!

andhus · 2017-10-10T14:27:38Z

Summary of the very long comment above:

I think best option is to add support for (non iterated over) constants in RNN as per the original solution in this PR: 28795f1 This way we keep down the number of classes, and for implementing attention we just need to wrap/implement the RNN cell.
In this case, I can't come up with a better name then constants. It can be made clear from documentation that these are tensors passed "in whole" to cell.call, i.e. they are "not iterated over". I'd prefer to use such a general name because it could be used for other things than "attention" - however if we think it will only be used for attention we could name the argument attended or attended_inputs.
It would be great with feedback from more contributors. If we want to speed up the process I'd be happy to set up a hangouts session for discussion. I've already explored the pros and cons of many alternatives but its challenging to present the full analysis in comments here.

fchollet · 2017-10-13T18:00:59Z

I think best option is to add support for (non iterated over) constants in RNN as per the original solution in this PR: 28795f1 This way we keep down the number of classes, and for implementing attention we just need to wrap/implement the RNN cell.

I've been looking at these PRs for some time and I'm still not sure. It would definitely be helpful to have more people chime in with their perspective.

I think you're probably right, it would be good to add constants to the RNN API. One thing to consider though, is that keras.backend.rnn already has support for constants. It would be good not to reimplement an independent, layer-level mechanism, and instead reuse what we have...

If we do this, the API changes would be:

constants=None in RNN's call. Ideally this would forward the constants to K.rnn (they are then appended to the first argument passed to the step function).
SequenceAttention wrapper class for RNN cells? (please confirm)

In this case, I can't come up with a better name then constants

constants is fine.

fchollet · 2017-10-13T18:05:56Z

It would be good not to reimplement an independent, layer-level mechanism, and instead reuse what we have...

Besides just the semantic problem of having multiple similar but independent mechanisms with the same name, one thing to consider is performance. Different backends might want to optimize the way they manage constants. Theano does it to some extent, for instance.

fchollet · 2017-10-13T18:13:13Z

Let's see... how bad would it be to have something like:

class RNN(...):

def __call__(self, inputs, initial_state=None, constants=None, **kwargs):
       # handle connectivity of `constants`, like we do for `initial_state`
       ...

def call(self, inputs, mask=None, training=None, initial_state=None, constants=None):
        ...
        if constants:
            if not has_arg(self.cell.call, 'constants'):
                 raise ValueError(...)
            def step(inputs, states):
                 constants = inputs[1:]
                 if has_arg(self.cell.call, 'training'):
                     return self.cell.call(inputs[0], constants=constants, training=training)
                 else:
                     return self.cell.call(inputs[0], constants=constants)
           
        last_output, outputs, states = K.rnn(step,
                                             inputs,
                                             initial_state,
                                             go_backwards=self.go_backwards,
                                             mask=mask,
                                             constants=constants,
                                             unroll=self.unroll,
                                             input_length=timesteps)

andhus · 2017-10-14T08:10:35Z

Good, at this point it seems obvious that we should add constants support to layer.RNN. It will be backward compatible and "non-advanced" users don't have to even now that it's there.

It would be good not to reimplement an independent, layer-level mechanism, and instead reuse what we have...

I'm not sure exactly what you mean by this, but yes it's important that Layer level API is consistent (in terminology and functionality) with the backend level, to the extent possible/meaningful. That is a good motivation to stick to the same name and behaviour for constants.

Besides just the semantic problem of having multiple similar but independent mechanisms with the same name [...]

I don't think there is anything strange about having a layer RNN that to a great extent "duplicates" functionality of backend rnn. I think it is clear what the layer level "brings to the table" (initialisation, building, holding weights...). Backend rnn is the purer (subset) of RNN functionality that is backend dependent.

[...] one thing to consider is performance. Different backends might want to optimize the way they manage constants. Theano does it to some extent, for instance.

Yes, this is important and quite complex. I wrote about it in my previous comments and put a note on it in the PR (https://github.com/fchollet/keras/pull/7980/files#diff-3118e4e28157032506f771f279a551c3R2340). I agree that its better to pass constants to backend rnn and let it explicitly pass them to the step_function (rather then partialising step_function with them), but regarding optimisation I'm not sure that is "enough" or makes a big difference anyway. For Theano we should actually pass all involved tensors in the step transformation, including the weights, as non_sequences to theano.scan. This is not done/supported with current API. I haven't profiled how big difference it makes for typical applications.

Let's see... how bad would it be to have something like: [...]

This is not too bad - it's one way of making sure that the constants are passed via backend rnn. There is a bug though: you assume constants are appended to inputs inside rnn but in fact they are appended to states (see e.g.: https://github.com/fchollet/keras/blob/master/keras/backend/tensorflow_backend.py#L2479). Note that this appending of constants is not documented in any of the backends' rnn. CNTK has no docs, Theano and Tensorflow backends don't mention what happens with constants and are in fact wrong when stating:

new_states: list of tensors, same length and shapes as 'states'

Where 'states' refers to the argument of the step_function. This is not true if constants are passed to rnn as if so: len(new_states) < len(states).

I think this mistake is symptomatic of one of the issues with the Keras recurrent code base - that arguments are not always what they "claim to be" - and will be different things at different times: "states can be either states - or states and constants" and "inputs can be either inputs - or inputs and initial_state" (in RNN.__call__) This causes mental overhead, boilerplate for splitting lists of inputs and decreased readability.

I'd recommend working towards removing this pattern. We should try to make step_functionin backend rnn reflect the RNNCell in RNN and I think there should be a separate argument for constants (not append them to states as is done now, or inputs as would be required in your code example) - this goes for the step_function as well as cell.call. It is more clear and easier to extend this way. This is easy to change but would not be backward compatible in terms of backend rnn's expectations on step_function (although, as mentioned above, its not part of API documentation how constants are passed to step_function). What's your thoughts on this, could we change signature to step_function(inputs, states, constants=None) and make it mirror the RNNCell?

The other example, that inputs to RNN can be either just the input tensor or a list of the input and initial_state, gets more complicated if we add yet another keyword argument constants to RNN.__call__ that could also be part of inputs. If inputsis a list of say two tensors, how can we know if the second one is intitial_stateor constants (especially if state_spec and constant_spechappen to be the same)? Is this really required, is it not possible to allow passing initial_state (and later constants) only by keyword arg? I get that is has to do with composing (and serialisation?) of models, I haven't looked into the details but there should be a better solution...

andhus · 2017-10-15T14:56:07Z

I think it is still better to accept this inconsistency, rather than enforcing the "unmotivated appending of constants to states" in RNNCell.call

Please trust me on this one; it will destroy readability of Attention Mechanisms and cause a lot of implementation overhead when trying to make them modular if we start mixing states and constants in the same list.

fchollet · 2017-10-16T21:33:15Z

Yes, passing all input tensors as a flat list is part of the internal API. On the user-facing side of things, we can always avoid doing that (and instead pass the additional arguments as keyword arguments), but it is unavoidable at model deserialization time.

I agree that it's difficult to figure out a reasonable way to parse the list of inputs. Since the problem only occurs at deserialization time, it should be possible to store in the layer config whether or not initial states and constants are "keras tensors" (i.e. whether they are part of the model topology), which would then provide all the information needed for deserialization.

Keep in mind that this is an implementation detail. The public API is still to use keyword arguments.

fchollet · 2017-10-16T21:34:35Z

Please trust me on this one; it will destroy readability of Attention Mechanisms

It is purely an implementation detail and will not modify the signatures of the call methods of either batch-level layers or cells. Hence no readability problem. We should do it, both for performance reasons and conceptual simplicity. See the earlier code snippet for an implementation suggestion.

andhus · 2017-10-17T05:24:26Z

The public API is still to use keyword arguments

Ok great, (1.) is not a problem then. As you suggest, RNN can store what's expected (in e.g. constants_spec), then the splitting of internal flat-list-format will be quite trivial. And yes, this has to be represented in the config (in a backward compatible way).

I'm not sure I understood the answer to (2.), is my understanding correct that we should not change expected signature of (backend rnn argument) step_function? But instead just wrap cell.call inside RNN?

andhus · 2017-10-17T05:39:41Z

It is purely an implementation detail [...] hence no readability problem.

I totally appreciate Keras extreme focus on the public API. But for the library (and API) to stay relevant, it has to continue to develop. The structure and readability of its internals are crucial for easy extension and for people to continue to contribute. Many improvements could be done that are in no conflict with public API. ...The quote above sounds a bit scary ;)

fchollet · 2017-10-17T21:54:56Z

I'm not sure I understood the answer to (2.), is my understanding correct that we should not change expected signature of (backend rnn argument) step_function? But instead just wrap cell.call inside RNN?

Yes, indeed, we can simply wrap the calls to the step function, as in the code snippet.

Your other comment: my quote on readability means to say that since method signatures don't change, the implementation of call is strictly the same. Hence is it not less readable: it is identical.

…ention_api_constants_support

…nstants in RNN

andhus · 2017-10-21T00:35:14Z

Ok, so I've updated this PR back to "supporting of constants in RNN" according our discussion.

For overview, there is a functioning implementation (based on the additions in this PR) of next steps:

base class for attention cell wrappers
mixture of gaussian 1D/sequence attention (including example)

here: andhus#3. I didn't want to pull it in to this PR until we've settled on the changes in RNN.

There is also an explanation of the example in the linked PR (and a benchmark MoGAttention vs. regular seq-to-seq) here: https://github.com/andhus/extkeras/tree/master/examples/attention

andhus · 2017-10-21T00:40:25Z

keras/layers/recurrent.py

-                               for dim in self.cell.state_size]
-        else:
-            self.state_spec = InputSpec(shape=(None, self.cell.state_size))
+        self.state_spec = None


For more advanced cells (typically attention cells such as this one: https://github.com/andhus/keras/pull/3/files#diff-9c4188ddc4dd173d80f64feed5b89412R258) the cell.state_sizeis not defined until the cell is built. But that's not a problem, state_specis just set a bit later in build (or in __call__ if initial_stateis passed to RNN)

fchollet · 2017-10-23T20:38:31Z

keras/layers/recurrent.py

-            initial_state = inputs[1:]
-            inputs = inputs[0]
+    def __call__(self, inputs, initial_state=None, constants=None, **kwargs):
+        inputs, initial_state, constants = self._normalize_args(


Prefer "standardize" (also used elsewhere) since "normalize" has a specific meaning elsewhere.

fchollet

Look good to me! Comments are mostly nitpicking points.

fchollet · 2017-10-23T20:39:49Z

keras/layers/recurrent.py

+            check_list += constants
+            self.constants_spec = [InputSpec(shape=K.int_shape(constant))
+                                   for constant in constants]
+            self._n_constants = len(constants)


In line with naming conventions in this API, this should be _num_constants.

fchollet · 2017-10-23T20:40:25Z

keras/layers/recurrent.py


-        is_keras_tensor = hasattr(initial_state[0], '_keras_history')
-        for tensor in initial_state:
+        check_list = []


Unclear what is meant by check_list. Maybe input_list would be better?

fchollet · 2017-10-23T20:40:43Z

keras/layers/recurrent.py

-                raise ValueError('The initial state of an RNN layer cannot be'
-                                 ' specified with a mix of Keras tensors and'
-                                 ' non-Keras tensors')
+                raise ValueError('The initial state and constants of an RNN'


Sure, I think I wanted to refer to that they "jointly" must contain only one kind of tensors (that's what checked for now)... I don't have any opinion on the error msg, but just to verify/understand: shouldn't all inputs be keras tensors or not - and why not include inputs then as well in the check (not only constants and initial_state)?

fchollet · 2017-10-23T20:42:18Z

keras/layers/recurrent.py

@@ -560,6 +614,48 @@ def call(self, inputs, mask=None, training=None, initial_state=None):
        else:
            return output

+    def _normalize_args(self, inputs, initial_state, constants):
+        """When running a model loaded from file, the input tensors


Nit: first line of docstring (one-line summary) should fit in one line and end with a period.

fchollet · 2017-10-23T20:42:30Z

keras/layers/recurrent.py

@@ -560,6 +614,48 @@ def call(self, inputs, mask=None, training=None, initial_state=None):
        else:
            return output

+    def _normalize_args(self, inputs, initial_state, constants):
+        """When running a model loaded from file, the input tensors
+        `initial_state` and `constants` can be passed to RNN.__call__ as part


` around code keywords

fchollet · 2017-10-23T20:42:48Z

keras/layers/recurrent.py

+        constants.
+
+        This method makes sure initial_states and constants are separated from
+        inputs and that the are lists of tensors (or None).


the -> they

fchollet · 2017-10-23T20:43:24Z

keras/layers/recurrent.py

+                initial_state = inputs[1:]
+            inputs = inputs[0]
+
+        def to_list_or_none(x):  # TODO break out?


Unless you want to use it elsewhere, it is fine as it is, you can remove this comment

fchollet · 2017-10-23T20:44:10Z

keras/layers/recurrent.py

@@ -618,6 +714,9 @@ def get_config(self):
                  'go_backwards': self.go_backwards,
                  'stateful': self.stateful,
                  'unroll': self.unroll}
+        if self._n_constants is not None:
+            config['_n_constants'] = self._n_constants


Please call it num_constants is the config.

fchollet · 2017-10-23T20:45:57Z

keras/layers/recurrent.py

@@ -292,6 +293,14 @@ class RNN(Layer):
        `states` should be a numpy array or list of numpy arrays representing
        the initial state of the RNN layer.

+    # Note on passing external constants to RNNs
+        You can pass "external" constants to the cell using the `constants`
+        keyword argument of RNN.__call__ (as well as RNN.call) method. This


` around code keywords

andhus · 2017-10-24T23:06:30Z

keras/layers/recurrent.py

@@ -415,19 +422,46 @@ def compute_mask(self, inputs, mask):
            return output_mask

    def build(self, input_shape):
+        # Note input_shape will be list of shapes of initial states and


Note input_shape will be list of shapes of input followed by shapes of initial states and constants if the later as passed in __call__

…o RNN.__call__

andhus · 2017-10-24T23:48:31Z

I fixed the comments and added a test case for when both initial_stateand constants are passed and tests now also covers "flat list" inputs after serialisation.

What do you think: should we aim to merge this first separately (a pretty well defined feature addition) or continue to pull in next steps (actual attention mechanisms) before merging anything?

fchollet · 2017-10-24T23:49:36Z

What do you think: should we aim to merge this first separately (a pretty well defined feature addition)

We'll merge it first.

fchollet

LGTM, thanks!

andhus · 2017-10-25T05:54:43Z

@fchollet saw that build is failing on master: https://travis-ci.org/fchollet/keras/jobs/292414692 test_plot_model-> "ImportError: Failed to import pydot..."

dunnock · 2017-10-25T20:11:47Z

@andhus it seems after this merged into the master CuDNNLSTM has broke. After pulling I get error:

  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\layers\recurrent.py", line 482, in __call__
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\engine\topology.py", line 575, in __call__
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\layers\cudnn_recurrent.py", line 400, in build
  File "C:\ProgramData\Anaconda3\lib\site-packages\keras-2.0.8-py3.6.egg\keras\layers\recurrent.py", line 427, in build
AttributeError: 'CuDNNLSTM' object has no attribute '_num_constants'

I have latest nightly tensorflow

fchollet · 2017-10-25T20:17:21Z

Unfortunately the cuDNN RNN tests cannot be run on Travis since they require a GPU. We need to remember to run them manually everytime we make changes to recurrent.py.

I will look at the issue.

andhus · 2017-10-27T17:18:49Z

Yeah, thanks for taking care of this, did not think about that addition!

andhus · 2017-10-29T23:31:54Z

Continuation of Recurrent Attention API here: #8296

andhus added 2 commits September 24, 2017 23:26

Added support for passing external constants to RNN, which will pass …

28795f1

…them on to the cell

Added class for allowing functional composition of RNN Cells, support…

03b8fda

…ing constants

andhus mentioned this pull request Sep 24, 2017

Recurrent Attention (Step 2): Support functional composition of RNN Cells (incl. constants) #7981

Closed

put back accidentally commented out recurrent tests

53deca2

andhus added 6 commits October 7, 2017 07:42

added basic example of functional cell

568fd2e

new class AttentionRNN

2f9f6f0

restored RNN layer

1b90731

Merge branch 'master' of github.com:fchollet/keras into recurrent_att…

67bd184

…ention_api_unified

renamed constants to attended in FunctionRNNCell, avoided duplicating…

e74b125

… outputs in wrapped model

minor clean-up of docs

fb91e4e

andhus changed the title ~~Recurrent Attention (Step 1): Support for passing external constants to RNN~~ Recurrent Attention: AttentionRNN and FunctionalRNNCell Oct 8, 2017

andhus commented Oct 8, 2017

View reviewed changes

Minor cleanup & improvments in docs, fixed PEP breaking formatting in…

fcc854c

… attention test

andhus added 2 commits October 21, 2017 02:13

Merge branch 'master' of github.com:fchollet/keras into recurrent_att…

2775b2f

…ention_api_constants_support

removed FunctionalRNNCell and AttentionRNN, added back support for co…

ab89c6a

…nstants in RNN

andhus commented Oct 21, 2017

View reviewed changes

andhus changed the title ~~Recurrent Attention: AttentionRNN and FunctionalRNNCell~~ Recurrent Attention API Additions Oct 22, 2017

fixed PEP8 violations

95c2359

fchollet reviewed Oct 23, 2017

View reviewed changes

fixed minor review comments

86fdd93

andhus commented Oct 24, 2017

View reviewed changes

added test case for when both inital_state and constants are passed t…

d33d919

…o RNN.__call__

fchollet approved these changes Oct 25, 2017

View reviewed changes

fchollet merged commit 3f148e4 into keras-team:master Oct 25, 2017

andhus deleted the recurrent_attention_api_constants_support branch October 28, 2017 14:14

andhus changed the title ~~Recurrent Attention API Additions~~ Recurrent Attention API: Support constants in RNN Oct 29, 2017

andhus mentioned this pull request Oct 30, 2017

Recurrent Attention API: Cell wrapper base class #8296

Closed

andhus mentioned this pull request Oct 12, 2018

Recurrent Attention API for keras #11172

Closed

		@@ -194,6 +196,130 @@ def get_losses_for(self, inputs=None):
		return losses


		class FunctionalRNNCell(Wrapper):

Recurrent Attention API: Support constants in RNN #7980

Recurrent Attention API: Support constants in RNN #7980

Uh oh!

Conversation

andhus commented Sep 24, 2017

Uh oh!

roya0045 commented Sep 26, 2017

Uh oh!

fchollet commented Sep 30, 2017

Uh oh!

andhus commented Oct 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andhus commented Oct 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Oct 1, 2017

Uh oh!

andhus commented Oct 2, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Oct 3, 2017

Uh oh!

andhus commented Oct 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andhus Oct 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fchollet commented Oct 9, 2017

Uh oh!

andhus commented Oct 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Oct 9, 2017

Uh oh!

andhus commented Oct 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andhus commented Oct 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Oct 13, 2017

Uh oh!

fchollet commented Oct 13, 2017

Uh oh!

fchollet commented Oct 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andhus commented Oct 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andhus commented Oct 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Oct 16, 2017

Uh oh!

fchollet commented Oct 16, 2017

Uh oh!

andhus commented Oct 17, 2017

Uh oh!

andhus commented Oct 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Oct 17, 2017

Uh oh!

andhus commented Oct 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fchollet left a comment

Choose a reason for hiding this comment

Uh oh!

andhus commented Oct 1, 2017 •

edited

Loading

andhus commented Oct 1, 2017 •

edited

Loading

andhus commented Oct 2, 2017 •

edited

Loading

andhus Oct 9, 2017 •

edited

Loading

andhus commented Oct 9, 2017 •

edited

Loading

andhus commented Oct 10, 2017 •

edited

Loading

andhus commented Oct 10, 2017 •

edited

Loading

fchollet commented Oct 13, 2017 •

edited

Loading

andhus commented Oct 14, 2017 •

edited

Loading

andhus commented Oct 15, 2017 •

edited

Loading

andhus commented Oct 17, 2017 •

edited

Loading

andhus commented Oct 21, 2017 •

edited

Loading

andhus Oct 24, 2017 •

edited

Loading