Skip to content

Tokenizer fails to load from cache with local_files_only on 4.57.2 #42393

@chtruong814

Description

@chtruong814

System Info

transformers==4.57.2

Who can help?

@vasqu @ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

This call fails even if the tokenizer is in the cache:

AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B', local_files_only=True)

It fails with this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 1149, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2108, in from_pretrained
    remote_files = os.listdir(pretrained_model_name_or_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: 'meta-llama/Meta-Llama-3-8B'

It seems to be due to this change that is causing other issues:
#42299

https://github.com/huggingface/transformers/pull/42299/files#diff-85b29486a884f445b1014a26fecfb189141f2e6b09f4ae701ee758a754fddcc1R2108

Expected behavior

It should succeed in loading the tokenizer

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions