-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sampling #586
sampling #586
Conversation
Sampling rewrite See merge request edward.hu/sockeye!1
- Fixes sampling to work return proper scores and to work with `--nbest-size` - Simplifies the use of `inactive`, restoring the original code where we have a one-off case when t==1. This is necessary so that `topk()` doesn't select the same word in each row, but instead chooses the best items just from the first row (since all rows have the identical history of `<s>` at t==1). Before, "inactive" was initialized to mark non-first rows as inactive, but removing this use case simplifies the code.
Still need to fix for non-mxnet topk() (It's really annoying having essentially four different topk functions to fix)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another great decoder feature :) thanks!
FYI (FDI?), werde ich in einer Woche darauf zurückkommen. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take another pass later today. Looks good to me already though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for iterating and benchmarking! LGTM.
Still curious though about my last remaining comment for removing the asscalar()
to compute batch_size
in topk()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This adds sampling to Sockeye (via
--sample [N]
), a collaborative effort with @edwdh. It will createbeam_size
samples (and return as many asnbest_size
). If an integer is passed to--sample N
, samples at each time step for each hypothesis will only be taken for the topN
most-probable target-language tokens. The default is 0, which samples from the entire target vocabulary.This commit also changes the way
inactive
works, removing its dual purpose of handling the special case at timestep 1 when we need to runtopk()
only over items from the first row in each batch. This is now reverted to the old behavior, where we have a special conditional fort==1
. This simplifies some of the code and removes some confusing logic withinactive
.Pull Request Checklist
until you can check this box.
pytest
)./style-check.sh
)sockeye/__init__.py
. Major version bump if this is a backwards incompatible change.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.