[examples] SummarizationModule improvements #4951

sshleifer · 2020-06-12T03:53:04Z

This PR makes the SummarizationTrainer much more usable, and when improvements are not unique to summarization, they are implemented in lightning_base.py instead.

Checkpointing Before this PR, the code saves 5GB of PL checkpoints per epoch, now SummarizationTrainer saves the best checkpoint based on ROUGE 2 score, and also saves it in huggingface save_pretrained format using the on_save_checkpoint. This will help resolve lots of confusion in various issues about how to load the pl checkpoints.

The current summarization code can only accept bs=1 and takes 24h to run 1 epoch on CNN DM. With the following changes, you can train much faster, if you wish. The docs suggested that larger batch sizes were possible with default params, which is fixed.

Changes to Allow Faster Summarization Training

these are all optional and turned off by default

freezing: before this PR, it was basically only possible to finetune with batchsize 2-4 on a 16GB system. With --freeze_embeds and --freeze_encoder, you can get batch size MUCH higher, towards 32. I've seen strong results with these options.
On CNNDM and XSUM the datasets are 200K examples, and epochs are very long. For this reason it is preferable to run validation (and get a rouge score) more frequently, but with previous params each validation_step took 1hr. By passing --n_val=1000 --val_check_interval=0.25, you can run validation 4x per epoch and it only takes 3 minutes. I also allows the config's beam search parameters to be used, rather than hardcoding faster but lower scoring ones.
{train|val|test}_max_target_length: I have found it preferable to truncate train summaries to 56 for XSUM and CNNDM respectively, but doing this for val/test artificially inflates rouge scores. So these clargs are separated.

Changes to lightning_base

Number of trainable parameters and total parameters are logged by default.
All possible pl.Trainer clargs are passed through add_generic_args (Inspired by @nateraw)

WandbLogger

--logger wandb will instantiate a default wandb logger.
--logger wandb_shared will post results to here, so that the community can compare hyperparameter settings empirically.
the default logger is still tensorboard logger because it doesn't require making an account.

Distillation

SummarizationDistiller and T5SummarizationDistiller are checked in. This code was sent to me by a researcher who wishes to remain anonymous. DM to discuss.

…nto distilbart

LysandreJik

This all looks very cool, looking forward to using it!

examples/lightning_base.py

examples/summarization/README.md

sshleifer · 2020-06-17T17:48:30Z

Merging now. Happy to address post-merge comments!

sshleifer added 30 commits May 18, 2020 13:33

copy decoder layers

71850ad

Failing test

b01f1d5

Merge branch 'master' into distilbart

937d3d6

can import

fcc49a0

passing

4f5790f

real data test passes

4be1287

relatif importschlossen

8afb88c

boom boom

d2cc12b

bash

1edc50f

Fast dev run

174aaf3

boom boom

5a35811

bs=8

6302fb0

Cache tokenized

5a3ed99

Merge branch 'distilbart' of github.com:sshleifer/transformers_fork i…

70cf536

…nto distilbart

rouge

ca9c685

add rouge

7081d0f

boom boom

0ed4156

boom boom

04a8ace

boom boom

696c8c2

boom boom

3cebd56

boom boom

bf5782e

boom boom

74704ce

assert student small

bbc4e52

batch

c4530f9

boom boom

2ee9388

boom boom

51221fb

boom boom

abb81df

boom boom

bef77fc

metrics saving, but no val_check_interval honored

2b7132c

val check test passing with smaller batch size

2c72948

sshleifer added 6 commits June 14, 2020 17:58

Allow wandb logger

f179e7b

Wandb logger

c34b886

docs

c9597fe

docs

9e95429

pass through logger

99de2c3

boom boom

6bdfb14

sshleifer requested a review from julien-c June 14, 2020 23:09

sshleifer added 3 commits June 14, 2020 19:17

Move stuff to utils

5283878

on_save_checkpoint

8d867e9

Fix decoder mask

b7bb7cb

sshleifer linked an issue Jun 15, 2020 that may be closed by this pull request

Use finetuned-BART large to do conditional generation #4144

Closed

sshleifer added 8 commits June 15, 2020 17:14

Better logger name

9508199

Fixed merge conflicts

68f6ccd

style

1ffd6cb

fix import

0d592c5

Merge branch 'master' into distilbart-clean

7ff1d78

Merge branch 'master' into distilbart-clean

99a4866

rename -> SummarizationModule

d25442a

more tips

b56b4d8

LysandreJik approved these changes Jun 17, 2020

View reviewed changes

examples/lightning_base.py Show resolved Hide resolved

examples/summarization/README.md Outdated Show resolved Hide resolved

sshleifer added 4 commits June 17, 2020 10:41

Fix README

2deff3b

cleanup

f28fc63

Cleanup more

0ce2375

indent

b7e1d5e

sshleifer changed the title ~~[examples] SummarizationTrainer improvements~~ [examples] SummarizationModule improvements Jun 17, 2020

sshleifer merged commit 043f9f5 into huggingface:master Jun 17, 2020

sshleifer deleted the distilbart-clean branch June 17, 2020 17:51

sshleifer mentioned this pull request Jun 17, 2020

Use finetuned-BART large to do conditional generation #4144

Closed

ieBoytsov mentioned this pull request Jun 20, 2020

[examples] fixes arguments for summarization finetune scripts #5157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[examples] SummarizationModule improvements #4951

[examples] SummarizationModule improvements #4951

Uh oh!

sshleifer commented Jun 12, 2020 •

edited

Loading

Uh oh!

LysandreJik left a comment

Uh oh!

Uh oh!

Uh oh!

sshleifer commented Jun 17, 2020

Uh oh!

Uh oh!

[examples] SummarizationModule improvements #4951

[examples] SummarizationModule improvements #4951

Uh oh!

Conversation

sshleifer commented Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes to Allow Faster Summarization Training

WandbLogger

Distillation

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sshleifer commented Jun 17, 2020

Uh oh!

Uh oh!

sshleifer commented Jun 12, 2020 •

edited

Loading