sampling #586

mjpost · 2018-11-27T22:21:10Z

This adds sampling to Sockeye (via --sample [N]), a collaborative effort with @edwdh. It will create beam_size samples (and return as many as nbest_size). If an integer is passed to --sample N, samples at each time step for each hypothesis will only be taken for the top N most-probable target-language tokens. The default is 0, which samples from the entire target vocabulary.

This commit also changes the way inactive works, removing its dual purpose of handling the special case at timestep 1 when we need to run topk() only over items from the first row in each batch. This is now reverted to the old behavior, where we have a special conditional for t==1. This simplifies some of the code and removes some confusing logic with inactive.

Pull Request Checklist

Changes are complete (if posting work-in-progress code, prefix your pull request title with '[WIP]'
until you can check this box.
Unit tests pass (pytest)
Were system tests modified? If so did you run these at least 5 times to account for the variation across runs?
Passed code style checking (./style-check.sh)
You have considered writing a test
Updated major/minor version in sockeye/__init__.py. Major version bump if this is a backwards incompatible change.
Updated CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Sampling rewrite See merge request edward.hu/sockeye!1

- Fixes sampling to work return proper scores and to work with `--nbest-size` - Simplifies the use of `inactive`, restoring the original code where we have a one-off case when t==1. This is necessary so that `topk()` doesn't select the same word in each row, but instead chooses the best items just from the first row (since all rows have the identical history of `<s>` at t==1). Before, "inactive" was initialized to mark non-first rows as inactive, but removing this use case simplifies the code.

Still need to fix for non-mxnet topk() (It's really annoying having essentially four different topk functions to fix)

sockeye/constants.py

sockeye/inference.py

sockeye/lexical_constraints.py

sockeye/utils.py

sockeye/inference.py

fhieber

Yet another great decoder feature :) thanks!

sockeye/inference.py

sockeye/lexical_constraints.py

sockeye/translate.py

sockeye/inference.py

sockeye/lexical_constraints.py

sockeye/inference.py

mjpost · 2018-12-03T17:50:53Z

FYI (FDI?), werde ich in einer Woche darauf zurückkommen.

fhieber

I will take another pass later today. Looks good to me already though.

test/unit/test_inference.py

sockeye/inference.py

sockeye/translate.py

sockeye/utils.py

fhieber

thanks for iterating and benchmarking! LGTM.
Still curious though about my last remaining comment for removing the asscalar() to compute batch_size in topk().

fhieber

Thanks!

Edward Hu and others added 7 commits October 30, 2018 09:09

implemented decoding by sampling

bc1b9f5

Merge branch 'master' into sampling

bdcacbf

Sampling rewrite

189508b

Merge branch 'sampling-rewrite' into 'sampling'

af3945b

Sampling rewrite See merge request edward.hu/sockeye!1

Merge branch 'master' into sampling

583c57d

fixed util.topk() when MXNet is being used

6ff5793

Still need to fix for non-mxnet topk() (It's really annoying having essentially four different topk functions to fix)

mjpost requested review from davvil, fhieber, mjdenkowski and tdomhan as code owners November 27, 2018 22:21

mjpost added 3 commits November 28, 2018 13:35

added sampling from the top n vocab items

b3778f3

removed print statement

4c51720

pulled out batch_indices

45016f7

mjpost commented Nov 28, 2018

View reviewed changes

mjpost added 5 commits November 28, 2018 14:01

Merge branch 'master' into sampling

2ebea0e

bugfix in unravel

eea8d95

Merge branch 'sampling' into sampling_topn

6d5484c

fixed test case

a032e86

inverted conditional block to get rid of mypy error

3d4e8b3

mjpost changed the title ~~[WIP] sampling~~ sampling Nov 28, 2018

mjpost and others added 4 commits November 28, 2018 16:22

removed DEFAULT_RANDOM_SEED

192e9a4

fixed negative constraints with sampling

6d054f4

Merge remote-tracking branch 'origin/sampling_topn' into sampling

a4ed533

only update target_dists when sampling

d785bbf

fhieber reviewed Nov 29, 2018

View reviewed changes

fhieber added the feature label Nov 29, 2018

mjpost added 3 commits November 29, 2018 07:49

Merge branch 'master' of github.com:awslabs/sockeye into sampling

28e5986

added documentation and incremented version

97b3389

fixing code review items from @fhieber

9c5070f

Merge branch 'master' into sampling

8ac34f8

tdomhan reviewed Nov 29, 2018

View reviewed changes

sockeye/inference.py Show resolved Hide resolved

sockeye/inference.py Outdated Show resolved Hide resolved

sockeye/inference.py Outdated Show resolved Hide resolved

removed stray comment

ae00ba9

mjpost added 4 commits December 3, 2018 17:45

Merge branch 'master' into sampling

d18b830

added sampling test cases

535f02f

check for restrict_lexicon

869a7a0

simplified conditional

4055a02

fhieber reviewed Dec 6, 2018

View reviewed changes

mjpost added 2 commits December 6, 2018 06:56

cleanup test case

fdea4fc

reverted pre-computation of skip_softmax

a3cb97d

fhieber approved these changes Dec 7, 2018

View reviewed changes

mjpost added 2 commits December 10, 2018 15:21

Merge branch 'master' into sampling

9f0d70d

got rid of asscalar(), added some comments

ba128c8

fhieber approved these changes Dec 10, 2018

View reviewed changes

mjpost added 2 commits December 12, 2018 18:17

Merge remote-tracking branch 'amazon/master' into sampling

0b6fe53

Merge remote-tracking branch 'amazon/master' into sampling

5a0bf23

fhieber merged commit 094baca into awslabs:master Dec 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling #586

sampling #586

mjpost commented Nov 27, 2018 •

edited

Loading

fhieber left a comment

mjpost commented Dec 3, 2018 •

edited

Loading

fhieber left a comment

fhieber left a comment

fhieber left a comment

sampling #586

sampling #586

Conversation

mjpost commented Nov 27, 2018 • edited Loading

Pull Request Checklist

fhieber left a comment

Choose a reason for hiding this comment

mjpost commented Dec 3, 2018 • edited Loading

fhieber left a comment

Choose a reason for hiding this comment

fhieber left a comment

Choose a reason for hiding this comment

fhieber left a comment

Choose a reason for hiding this comment

mjpost commented Nov 27, 2018 •

edited

Loading

mjpost commented Dec 3, 2018 •

edited

Loading