Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensemble #732

Merged
merged 16 commits into from
Aug 31, 2018
Merged

Ensemble #732

merged 16 commits into from
Aug 31, 2018

Conversation

Waino
Copy link
Contributor

@Waino Waino commented May 21, 2018

Ensemble decoding #275 #376

Decodes using multiple models simultaneously,
combining their prediction distributions by averaging.
All models in the ensemble must share a target vocabulary.
Currently only supports text models.

  • Implemented a special DecoderState that is really a list of n decoder states.
  • Implemented a mock decoder that takes this state and pass it to n different decoders and sums.
  • Implemented a mock generator that takes in n states and averages the prediction distributions.
  • Implemented ensemble_translate.py that reads in n models and ensembles them.

The command line arguments of ensemble_translate.py are almost identical to translate.py,
with the exception that the argument -model should be given repeatedly,
e.g. ensemble_translate.py -model ensemble.part.1.pt -model ensemble.part.2.pt [...].
It would be possible to combine this functionality into the normal translate.py,
if it is acceptable to change the the type of opt.model from a single string to a list.
In the combined version opt.model would always be a list (perhaps rename to opt.models),
with the convention that a list of length 1 means that ensembleing is not used.

I tried to avoid modifying Translator, but it was not entirely possible.
The issue is the memory_bank, which is used and modified in ways that prevent transparently replacing it with a list.
The src_lengths are computed from memory_bank for other data types than text,
which is why only text models can be ensembled atm.
Also, memory_bank needs to be repeated for each beam, which required a small modification to Translator.

@Waino
Copy link
Contributor Author

Waino commented May 21, 2018

P.S. tools/pull_request_chk.sh fails for this PR, but the same tests also fail in master. No new failures were introduced.

@pianotaiq
Copy link

Hi, there.
This is a great feature and I think many users like me have been looking forward to it. I cloned the master to my local machine and just pulled (applied) the #732. And, so far, Waino's ensemble decoding has been working really fine. I tried 3-ensemble on each of classical enc-dec, brnn and also transformer training with a 1M pairs of Japanese-English bitext. They all worked nicely and showed more than 1 point improvement by BLEU.
Now I wonder what should be needed for #732 to be included in the master branch. (#732 seems to have been rejected because of some checking problems) I'm afraid I'm not an expert on NMT but is there anything I can do? Or anybody else?

@Waino
Copy link
Contributor Author

Waino commented Jun 7, 2018

Hi @pianotaiq

Thank you for your efforts with testing this patch, and for your comments.

Note that the whole repository at the moment has a large number of open issues w.r.t. pytorch 0.4. It seems that the Travis CI system is not functioning correctly at the moment due to them. However, those problems are discussed in other threads, and I don't think this pull request is the right place to work on them.

@da03
Copy link
Collaborator

da03 commented Jun 11, 2018

This looks fantastic! Working on it to incorporate our latest update.

@vince62s
Copy link
Member

vince62s commented Jul 3, 2018

@Waino after a few refactoring you need to adjust a bit, nothing major.
If you have questions please ask, but your PR is great.

models.append(model)
if shared_model_opt is None:
shared_model_opt = model_opt
# FIXME: anything to check or copy from other model_opt?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this #FIXME Repaired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for now, the only value of model_opt that is accessed after this point is copy_attn.
It is not possible to ensemble models using copy attention with those not using it.
A check could be added, but in that case checks should be added also for other incompatible configurations, which is too many.

I'm considering this FIXME resolved for now.

@Waino
Copy link
Contributor Author

Waino commented Aug 8, 2018

I've completed the refactoring, and smoke-tested with fresh models trained with both default parameters and Transformer. If there is something else I need to do, please let me know.

@alphadl
Copy link
Contributor

alphadl commented Aug 17, 2018

as @pianotaiq said , during translation~ when I ensemble several classical heterogeneous models with your PR , I can successfully translate the test sentences , however , If I use your PR to train the model , especially transformer model, I met following errors although I finished training ~

https://github.com/OpenNMT/OpenNMT-py/issues/902

have you noticed that kind of problems ? @Waino

@vince62s
Copy link
Member

@srush I know you prefer to limit the number of root binaries.
What about adding an option -models mutually exclusive with -model and just use the translate.py binary ?

@vince62s
Copy link
Member

@Waino translation_server.py would need to be adjusted too since you changed the build_translator?
I know we miss a test on this in travis.

@Waino
Copy link
Contributor Author

Waino commented Aug 30, 2018

It would be possible to combine this functionality into the normal translate.py,
if it is acceptable to change the the type of opt.model from a single string to a list.
In the combined version opt.model would always be a list (perhaps rename to opt.models),
with the convention that a list of length 1 means that ensembleing is not used.

What about adding an option -models mutually exclusive with -model and just use the translate.py binary ?

I agree that ensemble_translate.py is redundant, and moving the ensemble functionality to translate.py makes sense.

I would recommend modifying the behavior of -model instead of introducing a new option -models.
There is support in argparse for building a list by repeating an option. I currently use that in the way that ensemble models are specified.

I implemented the list-type opt.models as separate from the old opt.model, and the use_ensemble flag, only because I didn't want to change opt.model from a single string to a list without discussing it first. They could easily be combined, though.

The main benefits of using a single list-type opt.model through a repeatable -model option are:

  1. -model can remain a required parameter.
  2. The command line interface is clearer and easier to document, without options that differ by only a single character.
  3. Any code that needs to load a model/models, e.g. translation_server.py can use a single case processing a list of models, instead of needing control flow to check two separate options. This makes the ensemble functionality less likely to break due to future changes.

The main benefit of leaving things as they are is backwards compatibility with any external code that assumes opt.model to be a single string.

@vince62s
Copy link
Member

@bpopeters any insight on this ?

@Waino
Copy link
Contributor Author

Waino commented Aug 30, 2018

I went ahead and implemented it, at the same time as I adjusted translation_server.py (the solutions were interacting). So now we can discuss it over code.

@bpopeters
Copy link
Contributor

I agree that allowing multiple args with -model is a good thing.

About the particular implementation, what's the difference between using action='append' and nargs='+' in argparse? Does action='append' require the -model flag to be present multiple times on the command line if you are using an ensemble?

@Waino
Copy link
Contributor Author

Waino commented Aug 30, 2018

Yes, the difference between action='append' and nargs='+' is whether the flag is repeated or not. With append, multiple models are given as -model A -model B, while with nargs the same would be given as -model A B.

It is mostly a matter of taste. However, when using the nargs notation, the later models might be misread as positional arguments by someone who is not familiar with the notation.

@bpopeters
Copy link
Contributor

In a vacuum I don't have much of a preference (besides a slight preference for writing shorter commands :P). But the nargs pattern is already present with several command line arguments (-gpuid, for example). I think inconsistency is more likely to lead to confusion than users misunderstanding the semantics of the command line flags.

@Waino
Copy link
Contributor Author

Waino commented Aug 31, 2018

True, consistency is a priority.

I modified the option to use nargs. One caveat that I discovered while testing is that when action='append' is not specified, argparse silently overwrites the previous value if the option is reused. The effect of this is that giving -model A -model B in the current version will silently translate using only model B without ensemble.

@vince62s
Copy link
Member

Looks good, thanks !

@vince62s vince62s merged commit e60a54f into OpenNMT:master Aug 31, 2018
@bpopeters
Copy link
Contributor

There was one additional very minor issue, which is that explicit calls to forward are unnecessary and [have slightly reduced functionality]https://discuss.pytorch.org/t/any-different-between-model-input-and-model-forward-input/3690l) compared to calling the module directly. Although I've never been quite sure what someone would use hooks for.

@shahidmohana
Copy link

Hi everyone:
I need help in Ensmembling models:
I have trained OpenNMT-py (default-settings) on Data1 and Data2 seperately and obtained two kinds of models, when I tried to ensemble them it showed and error: "AssertionError: Ensemble models must use the same pre-processed data".
I thought it might be due to different vocabs, So I combined Data1 and Data2 files and preprocessed them.
Then I used the combined vocabulary 'demo.combined.vocab.pt' for training both the models (by replacing the original demo.vocab.pt files with demo.combined.vocab.pt) and got two kinds of pretrained models:
Now i ma trying to ensemble them but it gives the same error as before:
the exception to this is that it doesn't show the error for first pretrained models i.e. demo-model-type1_step_5000.pt and demo-model-type2_step_5000.pt
for rest of the models it throws the same error while I am using vocabulary obtained by two joined data.
seniors and experts please guide me. thanks

@vince62s
Copy link
Member

vince62s commented Jun 3, 2019

@shahidmohana we have identified the issue, working on it.

@vince62s
Copy link
Member

vince62s commented Jun 3, 2019

fixed by #1453

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants