Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete merge Seq-2-Seq generation into default generation #3225

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Mar 11, 2020

This is a follow-up PR to finalize #3140 .

There was still no conclusion on how to handle the fairseq tricks for generation. To summarize:

I think we have three options:

  1. Remove all faiseq tricks. Here the ROUGE score is: 20.285
  2. Implement the fairseq tricks EXCEPT leaving the starting decoding_inputs_tokens to be the BOS token instead of EOS. Here the ROUGE score is: 19.369
  3. Add all fairseq tricks and maybe add a new argument to generate() which is called decoder_start_token_id=bos_token_id , but can be overriden to be the eos_token_id in the case of Bart. Here the ROUGE score is: 21.072

ROUGE scores from @sshleifer

For comparison:

76256460-5b675780-6226-11ea-8516-28d0427251ca

UPDATE:

Given the above scores, option 1. was chosen for the moment to have the same scores as fairseq.
This means that we have to start the decoder ids with a EOS token (which might be weird and fairseq specific). Therefore, a new argument decoder_start_token_id was added to the generate function that defaults to the bos_token_id. When using Bart generate this argument should be set to the eos_token_id to have good results. To see how Bart.generate() should be used take a look at:

hypotheses_batch = hf.generate(

At the moment option 2. is implemented which seems to give the worst results and is also not the cleanest option.
This PR implements option 3.

For me either option 1. or option 3. is fine. Up to discuss @thomwolf , @julien-c , @LysandreJik @sshleifer

@patrickvonplaten patrickvonplaten requested review from sshleifer and thomwolf and removed request for sshleifer March 11, 2020 13:38
@patrickvonplaten patrickvonplaten force-pushed the finalize_merge_bart_generate_into_default_generate branch from f5a6bd0 to 6047f46 Compare March 12, 2020 19:18
@codecov-io
Copy link

codecov-io commented Mar 12, 2020

Codecov Report

Merging #3225 into master will increase coverage by 0.1%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #3225     +/-   ##
=========================================
+ Coverage   77.93%   78.03%   +0.1%     
=========================================
  Files          98       98             
  Lines       16666    16668      +2     
=========================================
+ Hits        12988    13007     +19     
+ Misses       3678     3661     -17
Impacted Files Coverage Δ
src/transformers/modeling_utils.py 93.86% <100%> (+0.15%) ⬆️
src/transformers/modeling_tf_utils.py 88.55% <0%> (+2.86%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2e81b9d...6a82f77. Read the comment docs.

@sshleifer
Copy link
Contributor

Another issue that came up today from @yjernite, Bart does not support do_sample=True before 3140 this was clear and now it is :(

@patrickvonplaten
Copy link
Contributor Author

patrickvonplaten commented Mar 12, 2020

Bart.generate() with do_sample=True does not throw any errors. If it should not be done this way then we can just write in the docs that do_sample should be set to False and show an example. I don't see a problem here. We could think about a Bart.summarize() that calls Bart.generate() with the correct parameters if it's just for the prettier API.

@sshleifer
Copy link
Contributor

I prefer option #3, it looks like very little code and the best metrics.
Could you also fix the kwarg in examples/summarization/bart/evaluate_cnn.py?

@patrickvonplaten
Copy link
Contributor Author

I prefer option #3, it looks like very little code and the best metrics.
Could you also fix the kwarg in examples/summarization/bart/evaluate_cnn.py?

done :-)

@patrickvonplaten
Copy link
Contributor Author

Linked to PR: #3264
Might need to fix eos_token_id conflicts when rebasing/merging.

@thomwolf
Copy link
Member

This one is ok for me.
Is it currently option 3 which is implemented?

Also, I guess we could default to do_sample==False in generate(). Seems like the default expectation from the user is simple greedy decoding to me.

@patrickvonplaten
Copy link
Contributor Author

Yeah currently option 3 is implemented.
do_sample used to default to False, but was changed to True by @LysandreJik (can't find the issue/PR anymore :-/) Does not matter too much for me what is chosen, just would need to update some tests in modeling_utils.py

@sshleifer
Copy link
Contributor

Yeah I'd love to merge this! Having trouble connecting to brutasse to run rouge, but afaict it will be the same as pre 3140 :)

@patrickvonplaten patrickvonplaten changed the title [Discussion] Seq-2-Seq generation merge Complete merge Seq-2-Seq generation into default generation Mar 14, 2020
@patrickvonplaten
Copy link
Contributor Author

Good to merge for me!
changing do_sample=False can be done in another PR, I think.

IMPORNTANT:
the config.json files of:

  • bart-large-cnn
  • bart-large-mnli
  • bart-large

have to be updated on AWS to pass all slow tests. All special_tokens_id parameters should be deleted there. For the moment, we will go with the solution:
#3264 .

@thomwolf
Copy link
Member

Ok merging. I let you continue in other PRs and ping me.

@thomwolf thomwolf merged commit 3814e16 into huggingface:master Mar 14, 2020
@patrickvonplaten
Copy link
Contributor Author

Good to merge for me!
changing do_sample=False can be done in another PR, I think.

IMPORNTANT:
the config.json files of:

  • bart-large-cnn
  • bart-large-mnli
  • bart-large

have to be updated on AWS to pass all slow tests. All special_tokens_id parameters should be deleted there. For the moment, we will go with the solution:
#3264 .

Changed the configs on AWS. All slow tests pass now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants