Skip to content

[Doc] add_special_tokens's documentation is ambigus  #22935

Closed
@zplizzi

Description

@zplizzi

System Info

  • transformers version: 4.28.1
  • Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.31
  • Python version: 3.9.5
  • Huggingface_hub version: 0.13.2
  • Safetensors version: not installed
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: no
  • Using distributed or parallel set-up in script?: no

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoTokenizer

tok = AutoTokenizer.from_pretrained("EleutherAI/pythia-70m")
print(tok.bos_token)
print(tok.eos_token)
print(tok.bos_token_id)
print(tok.eos_token_id)

print(tok("the dog walked", add_special_tokens=True))

outputs

<|endoftext|>
<|endoftext|>
0
0
{'input_ids': [783, 4370, 7428], 'attention_mask': [1, 1, 1]}

Expected behavior

I expect it to output [0, 783, 4370, 7428, 0]. Or am I misunderstanding what add_special_tokens is supposed to do?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions