Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

Open
ghost opened this issue Sep 26, 2017 · 8 comments
Open

GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

ghost opened this issue Sep 26, 2017 · 8 comments

Comments

@ghost
Copy link

ghost commented Sep 26, 2017

Hi, I am trying to reproduce the training results I generated using google/seq2seq on tensorflow/nmt.

I noticed that standard hyperparams provided here lead to much higher BLEU score (15.9 vs. 21.45 BLEU) when the same number of steps (80K) are trained. Is it because of the algorithmic changes such as normed_bahdanau and gnmt_v2, or is it because of the optimized implementation of NMT? Or because of any reasons else?

One more thing is, it uses SGD instead of Adam, which was the default of google/seq2seq. Moreover, the used learning rate is surprisingly high (1.0). Maybe the optimizer changes affected the training curve? I think Adam is more commonly used these days, so why is SGD selected in this case?

I would appreciate any explanations or comments that help me understand the algorithmic and implementational differences between these two trainings.

@lmthang
Copy link
Contributor

lmthang commented Sep 27, 2017

Hi jongsae,

I think gnmt_v2 is a main factor then SGD. normed_bahdanau is better than bahdanau attention; scaled_luong is good, but somehow I couldn't get it to work with gnmt_v2. For optimizer, what I observed personally is Adam makes things easier to train but if you can manage to train with SGD with large learning rate, you will get better result! In fact, in all my NMT papers in 2014-2016, I used a pretty universal set of hyperparameters: sgd, learning rate 1.0, uniform init 0.1, grad norm 5, dropout 0.2 :)

Hope that helps!

@ghost
Copy link

ghost commented Oct 5, 2017

Hello @lmthang,

I am developing an NMT system using google/seq2seq implementation. I would like to know your suggestions whether I should switch to tensorflow/nmt or it doesn't matter.

Both the implementations are a product of a Google team. I was wondering why both of them exists.

@lmthang
Copy link
Contributor

lmthang commented Oct 5, 2017

Hi @ssokhey,

The google/seq2seq was developed to be general purpose with usage of Estimator & various customizations / add-ons.

The tensorflow/nmt was initially developed under a teaching perspective that avoids too high-level APIs like Estimator that abstracts away many details. Over the course of development, we also managed to replicate Google's NMT system with very good performance (outperforming the google/seq2seq too, see https://github.com/tensorflow/nmt#wmt-english-german--full-comparison).

I'd recommend using tensorflow/nmt as it is still being regularly maintained and can be used with newer versions of TF.

@ghost
Copy link

ghost commented Oct 5, 2017

Thanks a lot! @lmthang

@frajos100
Copy link

Hi
We have set up TensorFlow NMT for training from French to English but the translation quality is not at all good even after 196K steps. Had raise an issue at #328. Do we need a new version of Tensorflow as I had installed tensorflow Nightly version. Is it the reason for the bad quality of translation. Also how do we set the NMT to MultiLingual model and zero shot translation ?

@ghost
Copy link

ghost commented May 18, 2018

Hey @frajos100

Can you share the parameter setting you're using and also what is the dataset size?

@frajos100
Copy link

The Parameter settings is same as the wmt16_gnmt_4_layer.json present at nmt / standard_hparams / wmt16_gnmt_4_layer.json on https://github.com/tensorflow/nmt
The Dataaset that I had used is a preprocess data after tokenisation using the shell script provided wmt16_en_de.sh modified for the French version
The Download links modified is as follows
http://www.statmt.org/europarl/v7/fr-en.tgz
http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz
http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz
http://data.statmt.org/wmt17/translation-task/dev.tgz
I am really new at NMT and hence need your help.
Also how do we set the NMT to MultiLingual model and zero shot translation ?

@frajos100
Copy link

HI ssokhey ,
Any directions on how we could get the Frencch to English training improved?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants