RNN.call should get initial state from full input spec #10845

yanboliang · 2018-08-04T08:01:34Z

Summary

This PR fix a critical bug in RNN which reported at #9449 and #10830 .

In RNN.call, if initial_state is a tensor that was returned by a Keras layer, we should get initial_state from full input spec(including training data, state and constants) which was generated at RNN.__call__, as it could be copied to multiple GPUs. Otherwise, it would use the original initial_states which is not be sliced according the number of GPUs.
BTW, I have also check CuDNNRNN, it use the correct way, so we don't need to modify it.

I run the following test code in a machine with 2 GPUs, it works well after this fix.

import numpy as np
import keras
from keras import layers as L
from keras.models import Sequential, Model
from keras.utils.multi_gpu_utils import multi_gpu_model

x = L.Input((4,3))
init_state = L.Input((3,))
y = L.SimpleRNN(3,return_sequences=True)(x,initial_state=init_state)
_x = [np.random.randn(2,4,3),np.random.randn(2,3)]
_y = np.random.randn(2,4,3)
m = Model([x,init_state],y)
m2 = multi_gpu_model(m,2)
m2.compile(loss='mean_squared_error',optimizer='adam')
m2.train_on_batch(_x,_y)

Related Issues

#9449
#10830

PR Overview

This PR requires new unit tests [y/n] (make sure tests are included)
This PR requires to update the documentation [y/n] (make sure the docs are up-to-date)
This PR is backwards compatible [y/n]
This PR changes the current API [y/n] (all API changes need to be approved by fchollet)

fchollet

LGTM, thanks

mikejhuang · 2018-11-27T03:05:36Z

This bugfix doesn't work with ConvLSTMs with initialized states. I think it's because ConvLSTM initializes states differently than other LSTMs. Instead of taking two inputs, (input, init_state=init_state),
ConvLSTM takes in one single list containing the initial states ([input, init_state_h, init_state_c])

JunHyungYu · 2019-04-17T15:27:59Z

hi @mikejhuang i have same issue with you in convLSTM....

how did you solve your problem ??
if you know about it , please tell me ... i can't train my seq2seq convLSTM model with initial_state...
somebody help me~~

mikejhuang · 2019-04-18T02:46:51Z

@JunHyungYu
I used this bugfix
pascalxia@6750e1e

To get it to work, you input a list containing the initial states ([input, init_state_h, init_state_c])

JunHyungYu · 2019-04-30T18:47:08Z

@mikejhuang
if i use convlstm()([input, init_state_h, init_state_c]) , i can make multi gpu model..

but when i train this multi gpu model, i have same error ..

InvalidArgumentError: Incompatible shapes: [8,32,224,224] vs. [16,32,224,224]
[[Node: replica_0_6/model_13/conv_lst_m2d_22/while/add_7 = Add[T=DT_FLOAT, _class=["loc:@train.../Reshape_1"], _device="/job:localhost/replica:0/task:0/device:GPU:0"](replica_0_6/model_13/conv_lst_m2d_22/while/BiasAdd_3, replica_0_6/model_13/conv_lst_m2d_22/while/convolution_7)]]
[[Node: training_7/Adam/gradients/replica_0_6/model_13/conv_lst_m2d_22/while/add_7_grad/Shape_1/_2523 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2234_...ad/Shape_1", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

can you train your convlstm with initial_state using multi gpu model????

mikejhuang · 2019-05-01T05:57:37Z

Nope, I haven't worked it out. Let me know if you find out.

…

On Wed, May 1, 2019 at 2:49 AM JunHyungYu ***@***.***> wrote: @mikejhuang <https://github.com/mikejhuang> if i use convlstm()([input, init_state_h, init_state_c]) , i can make multi gpu model.. but when i train this multi gpu model, i have same error .. can you train your convlstm with initial_state using multi gpu model???? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#10845 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACTAYDGFUVPNKVRCQ7U4CB3PTCICFANCNFSM4FN3TBPA> .

JunHyungYu · 2019-05-01T06:56:44Z

@mikejhuang Oh my god.... thank you..

dzhv · 2019-06-04T11:38:18Z

@mikejhuang, @JunHyungYu have any of you managed to find a workaround for the issue? I'm struggling with it atm :(

maym2104 · 2019-07-16T17:49:16Z

Same issue with ConvLSTM2D w/ latest of Keras (2.2.4) and TensorFlow 1.13

maym2104 · 2019-08-17T12:31:13Z

It only happens when I use this combination:

ConvLSTM
w/ dropout and/or recurrent_dropout arg set
in a multi-gpu model.

It is working when I remove the dropout on this layer, but it results in a less efficient learning :/

EDIT:
I looked at the code because it was not clear to me where the dropout was applied. It would seem it is applied before any multiplication with the parameters is done. So a dropout layer before the ConvLSTM -- or LSTM, or RNN -- would be equivalent to the dropout argument (but not the recurrent one, since it is applied on the state). (In fact I already had one so it was redundant). The only difference in the case of LSTM and ConvLSTM is that a different mask is applied for each of the gate.
Replacing recurrent dropout is less trivial though, and it makes a huge difference.

yanboliang added 2 commits August 4, 2018 00:40

RNN should get initial state from full input spec

5e497c2

Reorg inputs of Bidirectional.call

04981ba

fchollet approved these changes Aug 5, 2018

View reviewed changes

fchollet merged commit f6fad22 into keras-team:master Aug 5, 2018

yanboliang deleted the rnn-gpu branch August 6, 2018 01:52

andreynikk mentioned this pull request Aug 29, 2018

multi_gpu_model InvalidArgumentError on seq2seq model #11024

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RNN.call should get initial state from full input spec #10845

RNN.call should get initial state from full input spec #10845

yanboliang commented Aug 4, 2018 •

edited

Loading

fchollet left a comment

mikejhuang commented Nov 27, 2018 •

edited

Loading

JunHyungYu commented Apr 17, 2019

mikejhuang commented Apr 18, 2019

JunHyungYu commented Apr 30, 2019 •

edited

Loading

mikejhuang commented May 1, 2019 via email

JunHyungYu commented May 1, 2019

dzhv commented Jun 4, 2019

maym2104 commented Jul 16, 2019

maym2104 commented Aug 17, 2019 •

edited

Loading

RNN.call should get initial state from full input spec #10845

RNN.call should get initial state from full input spec #10845

Conversation

yanboliang commented Aug 4, 2018 • edited Loading

Summary

Related Issues

PR Overview

fchollet left a comment

Choose a reason for hiding this comment

mikejhuang commented Nov 27, 2018 • edited Loading

JunHyungYu commented Apr 17, 2019

mikejhuang commented Apr 18, 2019

JunHyungYu commented Apr 30, 2019 • edited Loading

mikejhuang commented May 1, 2019 via email

JunHyungYu commented May 1, 2019

dzhv commented Jun 4, 2019

maym2104 commented Jul 16, 2019

maym2104 commented Aug 17, 2019 • edited Loading

yanboliang commented Aug 4, 2018 •

edited

Loading

mikejhuang commented Nov 27, 2018 •

edited

Loading

JunHyungYu commented Apr 30, 2019 •

edited

Loading

maym2104 commented Aug 17, 2019 •

edited

Loading