Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

Cannot replicate the results mentioned in the repo (English-Vietnamese) . #61

Open
mjlaali opened this issue Aug 11, 2017 · 13 comments
Open

Comments

@mjlaali
Copy link

mjlaali commented Aug 11, 2017

I cannot replicate the result that mentioned in the repo. Here are my settings:
Python 2.7
Tensorflow 1.2.1
Using a docker based on nvidia/cuda:8.0-cudnn5-devel-ubuntu14.04

The command I ran was:
python2 -m nmt.nmt
--src=vi --tgt=en
--vocab_prefix=/data/nmt/iwslt15/vocab
--train_prefix=/data/nmt/iwslt15/train
--dev_prefix=/data/nmt/iwslt15/tst2012
--test_prefix=/data/nmt/iwslt15/tst2013
--out_dir=/data/nmt/models/nmt_attention_model
--hparams_path=nmt/standard_hparams/iwslt15.json
--num_gpus=2

I got the blue score of 24.83, however, on the website, 26.1 has been reported.

@lmthang
Copy link
Contributor

lmthang commented Aug 13, 2017

You have --src=vi --tgt=en, so it's Vietnamese-English (for English-Vietnamese, try --src=en --tgt=vi) & that's about the number we got :) I'll update the tutorial with Vietnamese - English results as well.

@mjlaali
Copy link
Author

mjlaali commented Aug 17, 2017

Thanks for the clarification, I tried with --src=en --tgt=vi and I got a Bleu of 25.38 on the test set. Could you clarify if the difference between my results and what was reported (0.72 in the Bleu score) is normal?

@bastings
Copy link
Contributor

@mjlaali have you tried running it again? it could be just variance (I got 26.4).

@bastings
Copy link
Contributor

@lmthang regarding replicating IWSLT results: I manage using python 2.7.12, but in python 3.5.2 I end up with BLEU 0.7, so something seems really broken there. I'm not sure yet what it is.

@oahziur
Copy link
Contributor

oahziur commented Aug 22, 2017

@bastings Yes, there is an encoding problem in python3.

I think this pull request fixed the problem.

@bastings
Copy link
Contributor

Hi @oahziur in that PR you mention this is fixed in the latest version, but I tested it with the latest code yesterday, so it seems it is not fixed. Are you sure get_translation(..) in nmt_utils.py behaves correctly?

@oahziur
Copy link
Contributor

oahziur commented Aug 22, 2017

@bastings

The printing to stdout part was fixed, but get_translation(...) still has problem in python3. I think the change in the PR should fix get_translation(...) after merging into master.

@bastings
Copy link
Contributor

@oahziur yes you are right; I just validated it, and with those changes I get 25.9 test on IWSLT15 en->vi.

@oahziur
Copy link
Contributor

oahziur commented Aug 23, 2017

@bastings

Are you loading the same model (the one you got 26.4 in python 2), but get 25.9 in python 3?

@mjlaali
Copy link
Author

mjlaali commented Aug 23, 2017

@bastings You are right, in the second run using two GPUs I got a BLEU of 26.3.

For the sake of clarity, in the run with a BLEU of 25.38, I trained my model using a CPU.

@bastings
Copy link
Contributor

bastings commented Aug 23, 2017

It was a separate run @oahziur so nothing to worry about.
I checked and I can do inference/evaluation in python3 without problems with a model trained with python2. (and get the same BLEU, after setting beam_width manually -- maybe this setting should transfer from the model settings).

However, setting the same random seed, I do get different results when training with python2 and python3. Is this expected?

@oahziur
Copy link
Contributor

oahziur commented Aug 24, 2017

@bastings Thanks! I was worry about there are still some encoding issue in python3, but I think that is fixed since you can get consistent results. I think some randomness during training is expected.

You should be able to reset beam_width so you can compare results among different beam_widths. Do you mean beam_width was not set correctly after you switch to python3 and you have to manually reset it?

@bastings
Copy link
Contributor

Yes, it seemed that beam_width defaulted to 1 for inference even though it was set to 10 during training.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants