Question about GRU-D implementation

Hi there, I have a question about calculating `dp_mask` for `x_t` and `m_dp_mask` for `m_t` in your GRU-D implementation (file [gru_d.py](https://github.com/BorgwardtLab/Set_Functions_for_Time_Series/blob/master/seft/models/gru_d.py)).

First, the `dp_mask` is generated from GRUCell built-in function `get_dropout_mask_for_cell`: [code](https://github.com/BorgwardtLab/Set_Functions_for_Time_Series/blob/9a13279b67023716f280dc18e0af8677b33cd72c/seft/models/gru_d.py#L235-L236)
Then, the dropout mask `m_dp_mask` for masking vector `m_t` is generated by calling `_generate_dropout_mask`: [code](https://github.com/BorgwardtLab/Set_Functions_for_Time_Series/blob/9a13279b67023716f280dc18e0af8677b33cd72c/seft/models/gru_d.py#L240-L247)
By doing so, the `dp_mask` and `m_dp_mask` zero out different elements in two inputs `x_t` and `m_t`. I can reproduce your result, however, I think that the dropout masks should be the same for `x_t` and `m_t`. Can you please clarify this for me? Did I misunderstand something in the core TensorFlow implementation/your implementation?

Thanks for the great work!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about GRU-D implementation #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about GRU-D implementation #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions