Learning of hidden state t=0 #8

peroyose · 2015-02-17T13:56:07Z

Dear all,

to my knowledge it's at the moment not possible to train the initial state of a recurrent layer (rec_layer.py).
For same applications this is helpfull and quite common. Therefore this pull request.

Thanks in advance for your answer,
Christian

P.S.:
I try to implement this very roughly in the "class RecurrentLayer". But it seems not so easy as expected. I introduced a new parameter (added to self.params), but the problem is training with mini-batches. Broadcasting seems not to work but mayby I made other mistakes. The function does't compile

initialization

    if self.train_init_state:
        # trainable init stated at t=0                                                                                                   
        self.init_state = theano.shared(
            numpy.zeros(self.n_hids, dtype=theano.config.floatX),
            name="init_state_%s"%self.name)
        self.params.append(self.init_state)

in method fprop:

    if not init_state:
        if hasattr(self, 'init_state'):
            if not isinstance(batch_size, int) or batch_size != 1:
                # not possible with shared or symbolic variables in theano                  
                #init_state = TT.tile(self.init_state, (batch_size, 1))
                # don't work and will not store the new                                                               
                #init_state = TT.alloc(self.init_state, batch_size, self.n_hids) 
                # broadcasting does't work ??                                    
                init_state = self.init_state
            else:
                #TODO test                                                                                                               
                init_state = self.init_state
        else:
            assert self.train_init_state == False
    if not init_state:
        if not isinstance(batch_size, int) or batch_size != 1:
            init_state = TT.alloc(floatX(0), batch_size, self.n_hids)
        else:
            init_state = TT.alloc(floatX(0), self.n_hids)

I got the following error:

ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 4) has dtype float32 and 2 dimension(s), while the result of the inner function for this output has dtype float32 and 2 dimension(s). This could happen if the inner graph of scan results in an upcast or downcast. Please make sure that you use dtypes consistently

I don't think that dtype is the problem, because the TT.alloc (zero vector as init state) has the same dtype, so I'm lost. What's your strategy to debug this kind of problems?
Thanks

…into separate

…parate Conflicts: experiments/rnnencdec/encdec.py

…parate

Numpy compat

Just a typo at line 831 additioanl->additional.

Typo at cost_layers.py at line 831

Tiny cleanups

First stab at setup.py

Import scan from theano instead of theano.sandbox.

Update README.md

kyunghyuncho and others added 30 commits July 13, 2014 18:31

do not stop loading even if a parameter is missing

1d4d648

Style

debcfc5

load whatever parameter is there in the save file

9a05c63

removed unnecessary mask shifting

d73dde6

_prefix expanded

06bec2f

Multiple change arguments

44da56f

Print total probability of translations

211d4e1

Refactor the RecurrentLayer a bit

61e9a18

Remove some debug outputs

3161280

Style

2a759f1

Saving every minute is too much

eb7a70e

Read representation only from the top layer

24c4b04

New changes

5635564

Change batch size to approximately that of the baseline

2e669c6

Score a pait of .txt files

bac2eca

Style, doc

7b979fc

Style

3253468

Gives names to functions

5eb0d5e

Merge branch 'separate' of https://github.com/rizar/groundhog-private …

fa378bb

…into separate

fixed the indexing bug

649924c

skip initilization flag

787fc40

skip_init for sample.py

a475157

Merge branch 'separate' of github.com:rizar/groundhog-private into se…

1f0a212

…parate Conflicts: experiments/rnnencdec/encdec.py

explicit query length and pos

300ac8d

Speed up beam search + time it

ded8f63

Switch off infinite looping for scoring

1ce1ff9

Merge branch 'separate' of github.com:rizar/groundhog-private into se…

6f159c5

…parate

added support for sample_zeros

4dd7867

visualizing the gates of grConv

fe28648

Merge remote-tracking branch 'origin/separate' into segmentation

dc3b2e1

kyunghyuncho and others added 30 commits September 12, 2014 12:17

Merge pull request #7 from janchorowski/numpy_compat

7c5ab5e

Numpy compat

Tiny cleanups

e3704d3

Comment change

67f6d78

Typo at cost_layers.py at line 831

bcbce62

Just a typo at line 831 additioanl->additional.

Merge pull request #9 from EderSantana/typo-correction

9d36218

Typo at cost_layers.py at line 831

More comment clarifications

7c9950d

Merge pull request #8 from janchorowski/tiny_cleanups

c68b918

Tiny cleanups

Adding a setup.py file

c74d36d

Merge pull request #5 from kastnerkyle/fixup_tutorial

70d8e71

First stab at setup.py

Merge branch 'master' of github.com:lisa-groundhog/GroundHog

19b906c

Make setup.py executable

5a273ba

Add shebang

4d124e7

web front-end can set the beam width

5171ae7

Preliminary implementation of hierarchical softmax added (GPU only)

eceb095

Merge branch 'master' of github.com:lisa-groundhog/GroundHog

eb3dffa

export HierarchicalSoftmaxLayer via __init__.py

08da30a

mention about the tokenization

30336ef

support differently-sized target vocabulary when scoring a pair

b07bfb4

HierarchicalSoftmaxLayer explicitly support only gpu

c222ab8

option to parse into characters

59dfe99

fixed wrong param init in RecurrentLayer

5b3034a

option for lowercasing

34b711a

Merge branch 'master' of github.com:lisa-groundhog/GroundHog

3f8b995

support 3-d tensor input to SoftmaxLayer

178d878

Merge branch 'master' of github.com:lisa-groundhog/GroundHog

66472ba

Import scan from theano instead of theano.sandbox.

7a8b0a4

Merge pull request #35 from sjtufs/master

c79cfb2

Import scan from theano instead of theano.sandbox.

Update README.md

0b96673

Update README.md

63e4769

Merge pull request #37 from lisa-groundhog/rizar-patch-1

4d2433b

Update README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Learning of hidden state t=0 #8

Learning of hidden state t=0 #8

peroyose commented Feb 17, 2015

Learning of hidden state t=0 #8

Are you sure you want to change the base?

Learning of hidden state t=0 #8

Conversation

peroyose commented Feb 17, 2015

initialization

in method fprop: