GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

ghost · 2017-09-26T19:52:50Z

Hi, I am trying to reproduce the training results I generated using google/seq2seq on tensorflow/nmt.

I noticed that standard hyperparams provided here lead to much higher BLEU score (15.9 vs. 21.45 BLEU) when the same number of steps (80K) are trained. Is it because of the algorithmic changes such as normed_bahdanau and gnmt_v2, or is it because of the optimized implementation of NMT? Or because of any reasons else?

One more thing is, it uses SGD instead of Adam, which was the default of google/seq2seq. Moreover, the used learning rate is surprisingly high (1.0). Maybe the optimizer changes affected the training curve? I think Adam is more commonly used these days, so why is SGD selected in this case?

I would appreciate any explanations or comments that help me understand the algorithmic and implementational differences between these two trainings.

lmthang · 2017-09-27T07:40:32Z

Hi jongsae,

I think gnmt_v2 is a main factor then SGD. normed_bahdanau is better than bahdanau attention; scaled_luong is good, but somehow I couldn't get it to work with gnmt_v2. For optimizer, what I observed personally is Adam makes things easier to train but if you can manage to train with SGD with large learning rate, you will get better result! In fact, in all my NMT papers in 2014-2016, I used a pretty universal set of hyperparameters: sgd, learning rate 1.0, uniform init 0.1, grad norm 5, dropout 0.2 :)

Hope that helps!

ghost · 2017-10-05T04:44:52Z

Hello @lmthang,

I am developing an NMT system using google/seq2seq implementation. I would like to know your suggestions whether I should switch to tensorflow/nmt or it doesn't matter.

Both the implementations are a product of a Google team. I was wondering why both of them exists.

lmthang · 2017-10-05T05:12:36Z

Hi @ssokhey,

The google/seq2seq was developed to be general purpose with usage of Estimator & various customizations / add-ons.

The tensorflow/nmt was initially developed under a teaching perspective that avoids too high-level APIs like Estimator that abstracts away many details. Over the course of development, we also managed to replicate Google's NMT system with very good performance (outperforming the google/seq2seq too, see https://github.com/tensorflow/nmt#wmt-english-german--full-comparison).

I'd recommend using tensorflow/nmt as it is still being regularly maintained and can be used with newer versions of TF.

ghost · 2017-10-05T11:50:25Z

Thanks a lot! @lmthang

frajos100 · 2018-05-16T12:26:36Z

Hi
We have set up TensorFlow NMT for training from French to English but the translation quality is not at all good even after 196K steps. Had raise an issue at #328. Do we need a new version of Tensorflow as I had installed tensorflow Nightly version. Is it the reason for the bad quality of translation. Also how do we set the NMT to MultiLingual model and zero shot translation ?

ghost · 2018-05-18T10:00:31Z

Hey @frajos100

Can you share the parameter setting you're using and also what is the dataset size?

frajos100 · 2018-05-18T10:18:06Z

The Parameter settings is same as the wmt16_gnmt_4_layer.json present at nmt / standard_hparams / wmt16_gnmt_4_layer.json on https://github.com/tensorflow/nmt
The Dataaset that I had used is a preprocess data after tokenisation using the shell script provided wmt16_en_de.sh modified for the French version
The Download links modified is as follows
http://www.statmt.org/europarl/v7/fr-en.tgz
http://www.statmt.org/wmt13/training-parallel-commoncrawl.tgz
http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz
http://data.statmt.org/wmt17/translation-task/dev.tgz
I am really new at NMT and hence need your help.
Also how do we set the NMT to MultiLingual model and zero shot translation ?

frajos100 · 2018-05-21T05:30:04Z

HI ssokhey ,
Any directions on how we could get the Frencch to English training improved?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

ghost commented Sep 26, 2017

lmthang commented Sep 27, 2017

ghost commented Oct 5, 2017

lmthang commented Oct 5, 2017

ghost commented Oct 5, 2017

frajos100 commented May 16, 2018

ghost commented May 18, 2018

frajos100 commented May 18, 2018

frajos100 commented May 21, 2018

GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

GNMT on tensorflow/nmt vs. GNMT on google/seq2seq #131

Comments

ghost commented Sep 26, 2017

lmthang commented Sep 27, 2017

ghost commented Oct 5, 2017

lmthang commented Oct 5, 2017

ghost commented Oct 5, 2017

frajos100 commented May 16, 2018

ghost commented May 18, 2018

frajos100 commented May 18, 2018

frajos100 commented May 21, 2018