Description
When executing guidellm against a vllm instance with an arbitrary model name set, guidellm errors out with a huggingface error that it can't access the tokenizer_config.json.
Duplicating the issue
Deploy a vllm instance with any model and set the following argument:
--served-model-name=my-model
Run a guidellm test against the endpoint:
guidellm \
--target "http://localhost:8000/v1" \
--model "my-model" \
--data-type emulated \
--data "prompt_tokens=512,generated_tokens=128"
Results
guidellm errors out with a 401 on the toeknizer_config.json for my-model since my-model
isn't a valid huggingface model name.
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/my-model/resolve/main/tokenizer_config.json
Stack Trace
The following is an example of a full stack trace of the error:
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
response.raise_for_status()
File "/opt/app-root/lib64/python3.11/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/granite/resolve/main/tokenizer_config.json
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/transformers/utils/hub.py", line 424, in cached_files
hf_hub_download(
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 961, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1068, in _hf_hub_download_to_cache_dir
_raise_on_head_call_error(head_call_error, force_download, local_files_only)
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1596, in _raise_on_head_call_error
raise head_call_error
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1484, in _get_metadata_or_catch_error
metadata = get_hf_file_metadata(
^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 1401, in get_hf_file_metadata
r = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 285, in _request_wrapper
response = _request_wrapper(
^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/file_download.py", line 309, in _request_wrapper
hf_raise_for_status(response)
File "/opt/app-root/lib64/python3.11/site-packages/huggingface_hub/utils/_http.py", line 459, in hf_raise_for_status
raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 401 Client Error. (Request ID: Root=1-67eb10fa-305b679009abf7055fe388ff;30339271-9f91-49ff-8324-c347a6b5da16)
Repository Not Found for url: https://huggingface.co/granite/resolve/main/tokenizer_config.json.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Invalid username or password.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/main.py", line 239, in generate_benchmark_report
tokenizer_inst = backend_inst.model_tokenizer()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/backend/base.py", line 173, in model_tokenizer
return AutoTokenizer.from_pretrained(self.model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 910, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/models/auto/tokenization_auto.py", line 742, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/utils/hub.py", line 266, in cached_file
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/transformers/utils/hub.py", line 456, in cached_files
raise EnvironmentError(
OSError: granite is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'
If this is a private repository, make sure to pass a token having permission to this repo either by logging in with `huggingface-cli login` or by passing `token=<your_token>`
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/bin/guidellm", line 8, in <module>
sys.exit(generate_benchmark_report_cli())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1161, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/main.py", line 171, in generate_benchmark_report_cli
generate_benchmark_report(
File "/opt/app-root/lib64/python3.11/site-packages/guidellm/main.py", line 241, in generate_benchmark_report
raise ValueError(
ValueError: Could not load model's tokenizer, --tokenizer must be provided for request generation
Why is this important
OpenShift AI sets the --served-model-name
argument to the name of the ServingRuntime the user provides when they are deploying a vLLM instance and does not use the actual huggingface model name. Any model deployed with OpenShift AI will not be able to be load tested with guidellm unless the user knows how to customize the --served-model-name
argument and that they need to set it to the correct huggingface name.