”never_split“ not working on BertTokenizer

### System Info

transformers 4.28.1
python 3.8.13

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

- I load BertTokenizer using my own vocab.txt, and add _'[outline]'_ into _never_split_, which is included in my vocab.txt. However,  _'[outline]'_ got splitted. Following is my code:

`tokenizer = BertTokenizer.from_pretrained(pretrained_path,never_split=['[outline]'])   
input = "。[outline]"  
print(tokenizer.tokenize(input))  # ['。', '[', 'out', '##line', ']']
`

- I also do:
`print(tokenizer.basic_tokenizer.tokenize(input))  #['。', '[', 'outline', ']']`


### Expected behavior

When I do:
`tokenizer.tokenize("。[outline]")`
Get the result as `['。', '[outline]']`, the tokens in never_split don't be splited.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

”never_split“ not working on BertTokenizer #23459

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

”never_split“ not working on BertTokenizer #23459

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions