Closed
Description
System Info
transformers
version: 4.25.1- Platform: Linux-5.10.133+-x86_64-with-glibc2.27
- Python version: 3.8.16
- Huggingface_hub version: 0.11.1
- PyTorch version (GPU?): 1.13.0+cu116 (False)
- Tensorflow version (GPU?): 2.9.2 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
The explanations of the bug and its reproduction are contained in the following google colab: https://colab.research.google.com/drive/19SS6Tzlgo0vntFtM6ZsCYq8BNZ5Dy1cS?usp=sharing
Expected behavior
I would expect the fast and slow tokenizers to treat the AddedToken
's arguments in the same way.
I think the loss of information for the slow tokenizer occurs at this line: