-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[examples] SummarizationModule improvements #4951
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
LysandreJik
approved these changes
Jun 17, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all looks very cool, looking forward to using it!
Merging now. Happy to address post-merge comments! |
sshleifer
changed the title
[examples] SummarizationTrainer improvements
[examples] SummarizationModule improvements
Jun 17, 2020
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR makes the SummarizationTrainer much more usable, and when improvements are not unique to summarization, they are implemented in
lightning_base.py
instead.save_pretrained
format using theon_save_checkpoint
. This will help resolve lots of confusion in various issues about how to load the pl checkpoints.The current summarization code can only accept bs=1 and takes 24h to run 1 epoch on CNN DM. With the following changes, you can train much faster, if you wish. The docs suggested that larger batch sizes were possible with default params, which is fixed.
Changes to Allow Faster Summarization Training
these are all optional and turned off by default
freezing: before this PR, it was basically only possible to finetune with batchsize 2-4 on a 16GB system. With
--freeze_embeds
and--freeze_encoder
, you can get batch size MUCH higher, towards 32. I've seen strong results with these options.On CNNDM and XSUM the datasets are 200K examples, and epochs are very long. For this reason it is preferable to run validation (and get a rouge score) more frequently, but with previous params each
validation_step
took 1hr. By passing--n_val=1000 --val_check_interval=0.25
, you can run validation 4x per epoch and it only takes 3 minutes. I also allows the config's beam search parameters to be used, rather than hardcoding faster but lower scoring ones.{train|val|test}_max_target_length
: I have found it preferable to truncate train summaries to 56 for XSUM and CNNDM respectively, but doing this for val/test artificially inflates rouge scores. So these clargs are separated.Changes to
lightning_base
pl.Trainer
clargs are passed throughadd_generic_args
(Inspired by @nateraw)WandbLogger
--logger wandb
will instantiate a default wandb logger.--logger wandb_shared
will post results to here, so that the community can compare hyperparameter settings empirically.Distillation
SummarizationDistiller
andT5SummarizationDistiller
are checked in. This code was sent to me by a researcher who wishes to remain anonymous. DM to discuss.