Closed
Description
System Info
transformers
version: 4.28.1- Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.31
- Python version: 3.9.5
- Huggingface_hub version: 0.13.2
- Safetensors version: not installed
- PyTorch version (GPU?): 1.13.1+cu117 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")
print(tok.bos_token)
print(tok.eos_token)
print(tok.bos_token_id)
print(tok.eos_token_id)
print(tok("the dog walked", add_special_tokens=True))
outputs
<|endoftext|>
<|endoftext|>
0
0
{'input_ids': [783, 4370, 7428], 'attention_mask': [1, 1, 1]}
Expected behavior
I expect it to output [0, 783, 4370, 7428, 0]
. Or am I misunderstanding what add_special_tokens
is supposed to do?
Metadata
Metadata
Assignees
Labels
No labels