After tokenizers upgrade, the length of the token does not correspond to the length of the model

### System Info

transformers：4.48.1
tokenizers：0.2.1
python：3.9

### Who can help?

@ArthurZucker @itazap

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

code snippet：
```
tokenizer = PegasusTokenizer.from_pretrained('IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese')
model = AutoModelForSeq2SeqLM.from_pretrained(
    'IDEA-CCNL/Randeng-Pegasus-238M-Summary-Chinese',
    config=config
)

training_args = Seq2SeqTrainingArguments(
    output_dir=config['model_name'],
    evaluation_strategy="epoch", 
    # report_to="none",
    save_strategy="epoch",
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    num_train_epochs=4,
    predict_with_generate=True,
    logging_steps=0.1

)
trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)
```

错误信息：

![Image](https://github.com/user-attachments/assets/05a636e0-5cde-47c7-9e31-f0a96b2a7f98)


Trial process:
My original Trasnformers: 4.29.1 tokenizers： 0.13.3. The model is capable of reasoning and training normally.
After upgrading, the above error occurred and normal training was not possible. Therefore, I adjusted the length of the model to 'model. resice_tokec_embeddings' (len (tokenizer)). Original model length: 50000, tokenizer loading length: 50103. So the model I trained resulted in abnormal inference results.

![Image](https://github.com/user-attachments/assets/67e1454d-1dea-48da-8100-d7f4997d9ee9)

Try again, keep tokenizers at 0.13.3, upgrade trasnformers at 4.33.3 (1. I need to upgrade because NPU only supports version 4.3.20. 2. This version is the highest compatible with tokenizers). After switching to this version, training and reasoning are normal.As long as tokenizers is greater than 0.13.3, length changes

### Expected behavior

I expect tokenizer to be compatible with the original code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

After tokenizers upgrade, the length of the token does not correspond to the length of the model #36532

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

After tokenizers upgrade, the length of the token does not correspond to the length of the model #36532

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions