Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Introduce sentence weighting #1438

Closed

Conversation

francoishernandez
Copy link
Member

@francoishernandez francoishernandez commented May 17, 2019

Basic idea: give an additional file when preprocessing, containing sentence weights.

These weights will be stored in the torchtext Examples and will be used to weight the loss in _compute_loss.

I introduce the -sentence_weights opt, to which we are supposed to pass some text file(s) containing the weights for each sentence / example. If several corpora are passed according to #1413 upgrades, such weight files should be passed as well. If we want/have weights for only some of the corpora in the list, we can pass None/none instead of the filename and it will be cast to python None by argparse, and weights of 1 will be assigned.

I did some tests on basic translation / speech / image runs, and it seems to be running without any issue.

if maybe_weights is not None:
weight_shards = split_corpus(maybe_weights, opt.shard_size)
else:
weight_shards = cycle(iter([cycle(iter([1]))]))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be replaced by: repeat(cycle(iter([1])))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants