Skip to content

Commit 121dd43 changes DialoGPT generation behavior #8032

Closed
@abisee

Description

@abisee

Environment info

  • transformers version: 3.3.1
  • Platform: Linux-4.4.0-127-generic-x86_64-with-debian-stretch-sid
  • Python version: 3.7.3
  • PyTorch version (GPU?): 1.6.0+cu101 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Using GPU in script?: yes (1 TITAN-XP)
  • Using distributed or parallel set-up in script?: no

Who can help

@cccntu @patrickvonplaten @LysandreJik

Information

Model I am using (Bert, XLNet ...): DialoGPT-large

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • an official GLUE/SQUaD task: (give the name)
  • my own task or dataset: (give details below)

To reproduce

Steps to reproduce the behavior:

  1. Checkout 121dd43.

  2. Run the DialoGPT "How to use" code given here, but change DialoGPT-medium to DialoGPT-large:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch


tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-large")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-large")

# Let's chat for 5 lines
for step in range(5):
    # encode the new user input, add the eos_token and return a tensor in Pytorch
    new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt')

    # append the new user input tokens to the chat history
    bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1) if step > 0 else new_user_input_ids

    # generated a response while limiting the total chat history to 1000 tokens, 
    chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)

    # pretty print last ouput tokens from bot
    print("DialoGPT: {}".format(tokenizer.decode(chat_history_ids[:, bot_input_ids.shape[-1]:][0], skip_special_tokens=True)))
  1. For the user's first utterance, type "Hello, how are you?". I get this output:
>> User:Hello, how are you?
DialoGPT: 're you a fan of the show?

Note: this problem is still present in the current version of master (5148f43).

Expected behavior

With the previous commit, 0c64b18, I get this output:

>> User:Hello, how are you?
DialoGPT: I'm good, you?

Possible cause

The issue seems to be related to the <|endoftext|> token, which is used at the end of every utterance. This is being regarded as a padding token, and thus it's attention-masked, which also seems to affect the position ids.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions