[WIP] Introduce sentence weighting #1438
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Basic idea: give an additional file when preprocessing, containing sentence weights.
These weights will be stored in the torchtext
Example
s and will be used to weight the loss in_compute_loss
.I introduce the
-sentence_weights
opt, to which we are supposed to pass some text file(s) containing the weights for each sentence / example. If several corpora are passed according to #1413 upgrades, such weight files should be passed as well. If we want/have weights for only some of the corpora in the list, we can passNone
/none
instead of the filename and it will be cast to pythonNone
by argparse, and weights of 1 will be assigned.I did some tests on basic translation / speech / image runs, and it seems to be running without any issue.