-
Notifications
You must be signed in to change notification settings - Fork 26.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Commit 121dd43 changes DialoGPT generation behavior #8032
Comments
Hi @abisee , sorry for the inconvenience. Even though you did not pass in attention mask, it is created here: (first 2 lines) transformers/src/transformers/generation_utils.py Lines 352 to 363 in 5148f43
changing this Here is how the bug can happen:
Don't have a better solution for now, will think about it. |
Thanks for the response @cccntu! My understanding is that both GPT2 and DialoGPT were trained without a pad token; i.e. neither model has a pad token embedding. In that case, why does the DialoGPT example code contain |
For generation, it seems that attention masks are created automatically (if there's an assigned pad token that appears in the input). See
However for training (at least for GPT2 models), as far as I can tell, the attention mask is not created automatically, even if there's an assigned pad token that appears in the input. This seems like an unexpected discrepancy, and another reason to put the attention mask creation in the model's |
That's a super interesting issue! Thanks for posting it here! So in short, in order to be able to do batch_generation with GPT2 (or Beam Search), we have to use some kind of token as the Just as you guys noticed the problem lies in IMO, it was a mistake to automatically create the I'm currently doing a big I hope that I'll be able to merge the PR in ~1 week. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Environment info
transformers
version: 3.3.1Who can help
@cccntu @patrickvonplaten @LysandreJik
Information
Model I am using (Bert, XLNet ...): DialoGPT-large
The problem arises when using:
The tasks I am working on is:
To reproduce
Steps to reproduce the behavior:
Checkout 121dd43.
Run the DialoGPT "How to use" code given here, but change
DialoGPT-medium
toDialoGPT-large
:Note: this problem is still present in the current version of master (
5148f43
).Expected behavior
With the previous commit,
0c64b18
, I get this output:Possible cause
The issue seems to be related to the
<|endoftext|>
token, which is used at the end of every utterance. This is being regarded as a padding token, and thus it's attention-masked, which also seems to affect the position ids.The text was updated successfully, but these errors were encountered: