Skip to content

mT5 additional_special_tokens seems not work #9747

Closed
@PiggyFan

Description

@PiggyFan

I want add some special tokens such as <CON_START> . But T5tokenizer/MT5tokenizer both can't tokenize correctly after using additional_special_tokens parameter. It still split these special tokens to subwords.
截圖 2021-01-22 下午5 18 00

It works when using OpenAIGPTTokenizer additional_special_tokens parameter. It's clear that after declare additional_special_tokens parameter, OpenAIGPTTokenizer tokenize as one word rather split it.
截圖 2021-01-22 下午5 54 57
截圖 2021-01-22 下午5 55 10

The version of transformers is 4.2.2
And I'm not sure this problem is related with issue624 in T5 which talk about SentencePiece extra vocab.

Thank you for your feedback

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions