Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Fix transformer_moe model has wrong logic in pre/postprocessing #1233

Conversation

twilightdema
Copy link
Contributor

There is a wrong logic in transformer_moe model that make training loss not decrease and make decoding always generate empty output.

By comparing logic with transformer.py and common_attention.py, I found that 'dp_postprocess' should receive input 'x' before passing 'dp_preprocess' in order to have it run with the same logic with transformer model. I changed the logic as in this commit and ran test data to confirm that training loss is decrease and decoding generate correct result.

Unit Testing Result:

Before Fix:
screenshot from 2018-11-17 15-51-04

After Fix:
screenshot from 2018-11-17 15-50-38

@afrozenator
Copy link
Contributor

Thanks a lot @twilightdema 👍

@afrozenator afrozenator merged commit eed1ccf into tensorflow:master Nov 21, 2018
tensorflow-copybara pushed a commit that referenced this pull request Nov 21, 2018
PiperOrigin-RevId: 222429349
kpe pushed a commit to kpe/tensor2tensor that referenced this pull request Mar 2, 2019
kpe pushed a commit to kpe/tensor2tensor that referenced this pull request Mar 2, 2019
PiperOrigin-RevId: 222429349
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cla: yes PR author has signed CLA
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants