-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensemble #732
Ensemble #732
Conversation
P.S. |
Hi, there. |
Hi @pianotaiq Thank you for your efforts with testing this patch, and for your comments. Note that the whole repository at the moment has a large number of open issues w.r.t. pytorch 0.4. It seems that the Travis CI system is not functioning correctly at the moment due to them. However, those problems are discussed in other threads, and I don't think this pull request is the right place to work on them. |
This looks fantastic! Working on it to incorporate our latest update. |
@Waino after a few refactoring you need to adjust a bit, nothing major. |
onmt/modules/Ensemble.py
Outdated
models.append(model) | ||
if shared_model_opt is None: | ||
shared_model_opt = model_opt | ||
# FIXME: anything to check or copy from other model_opt? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this #FIXME Repaired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least for now, the only value of model_opt
that is accessed after this point is copy_attn
.
It is not possible to ensemble models using copy attention with those not using it.
A check could be added, but in that case checks should be added also for other incompatible configurations, which is too many.
I'm considering this FIXME resolved for now.
I've completed the refactoring, and smoke-tested with fresh models trained with both default parameters and Transformer. If there is something else I need to do, please let me know. |
as @pianotaiq said , during translation~ when I ensemble several classical heterogeneous models with your PR , I can successfully translate the test sentences , however , If I use your PR to train the model , especially transformer model, I met following errors although I finished training ~ https://github.com/OpenNMT/OpenNMT-py/issues/902 have you noticed that kind of problems ? @Waino |
@srush I know you prefer to limit the number of root binaries. |
@Waino translation_server.py would need to be adjusted too since you changed the build_translator? |
I agree that I would recommend modifying the behavior of I implemented the list-type The main benefits of using a single list-type
The main benefit of leaving things as they are is backwards compatibility with any external code that assumes |
@bpopeters any insight on this ? |
I went ahead and implemented it, at the same time as I adjusted |
I agree that allowing multiple args with About the particular implementation, what's the difference between using |
Yes, the difference between It is mostly a matter of taste. However, when using the nargs notation, the later models might be misread as positional arguments by someone who is not familiar with the notation. |
In a vacuum I don't have much of a preference (besides a slight preference for writing shorter commands :P). But the nargs pattern is already present with several command line arguments ( |
True, consistency is a priority. I modified the option to use nargs. One caveat that I discovered while testing is that when |
Looks good, thanks ! |
There was one additional very minor issue, which is that explicit calls to |
Hi everyone: |
@shahidmohana we have identified the issue, working on it. |
fixed by #1453 |
Ensemble decoding #275 #376
Decodes using multiple models simultaneously,
combining their prediction distributions by averaging.
All models in the ensemble must share a target vocabulary.
Currently only supports text models.
DecoderState
that is really a list of n decoder states.The command line arguments of ensemble_translate.py are almost identical to translate.py,
with the exception that the argument
-model
should be given repeatedly,e.g.
ensemble_translate.py -model ensemble.part.1.pt -model ensemble.part.2.pt [...]
.It would be possible to combine this functionality into the normal translate.py,
if it is acceptable to change the the type of opt.model from a single string to a list.
In the combined version opt.model would always be a list (perhaps rename to opt.models),
with the convention that a list of length 1 means that ensembleing is not used.
I tried to avoid modifying
Translator
, but it was not entirely possible.The issue is the
memory_bank
, which is used and modified in ways that prevent transparently replacing it with a list.The
src_lengths
are computed frommemory_bank
for other data types than text,which is why only text models can be ensembled atm.
Also,
memory_bank
needs to be repeated for each beam, which required a small modification toTranslator
.