Closed
Description
System Info
transformers
version: 4.32.0.dev0- Platform: macOS-13.5.1-arm64-arm-64bit
- Python version: 3.9.13
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.3
- Accelerate version: not installed
- Accelerate config: not found
- PyTorch version (GPU?): not installed (NA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
- Ensure
protobuf
is uninstalled:
pip uninstall protobuf
- Import the
T5Tokenizer
:
from transformers import T5Tokenizer
tokenizer = T5Tokenizer.from_pretrained("t5-base")
Traceback:
UnboundLocalError Traceback (most recent call last)
Cell In[2], line 1
----> 1 tokenizer = T5Tokenizer.from_pretrained("t5-base")
File ~/transformers/src/transformers/tokenization_utils_base.py:1854, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, *init_inputs, **kwargs)
1851 else:
1852 logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 1854 return cls._from_pretrained(
1855 resolved_vocab_files,
1856 pretrained_model_name_or_path,
1857 init_configuration,
1858 *init_inputs,
1859 token=token,
1860 cache_dir=cache_dir,
1861 local_files_only=local_files_only,
1862 _commit_hash=commit_hash,
1863 _is_local=is_local,
1864 **kwargs,
1865 )
File ~/transformers/src/transformers/tokenization_utils_base.py:2017, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, *init_inputs, **kwargs)
2015 # Instantiate tokenizer.
2016 try:
-> 2017 tokenizer = cls(*init_inputs, **init_kwargs)
2018 except OSError:
2019 raise OSError(
2020 "Unable to load vocabulary from file. "
2021 "Please check that the provided vocabulary is accessible and not corrupted."
2022 )
File ~/transformers/src/transformers/models/t5/tokenization_t5.py:194, in T5Tokenizer.__init__(self, vocab_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, sp_model_kwargs, legacy, **kwargs)
191 self.vocab_file = vocab_file
192 self._extra_ids = extra_ids
--> 194 self.sp_model = self.get_spm_processor()
File ~/transformers/src/transformers/models/t5/tokenization_t5.py:200, in T5Tokenizer.get_spm_processor(self)
198 with open(self.vocab_file, "rb") as f:
199 sp_model = f.read()
--> 200 model_pb2 = import_protobuf()
201 model = model_pb2.ModelProto.FromString(sp_model)
202 if not self.legacy:
File ~/transformers/src/transformers/convert_slow_tokenizer.py:40, in import_protobuf()
38 else:
39 from transformers.utils import sentencepiece_model_pb2_new as sentencepiece_model_pb2
---> 40 return sentencepiece_model_pb2
UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment
This is occurring because we do import_protobuf
in the init:
But import_protobuf
is ill-defined in the case that protobuf
is not available:
transformers/src/transformers/convert_slow_tokenizer.py
Lines 32 to 40 in cb8e3ee
=> if protobuf
is not installed, then sentencepiece_model_pb2
will be un-defined
Has protobuf
been made a soft-dependency for T5Tokenizer inadvertently in #24622? Or can sentencepiece_model_pb2
be defined without protobuf
?
Expected behavior
Use T5Tokenizer without protobuf
Metadata
Assignees
Labels
No labels