-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tokenizer] Add an option to specify tokenizer #284
Conversation
This pr is very useful, my local test is always hard code tokenizer path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the great work!
if "llama" in tokenizer_name.lower() and kwargs.get("use_fast", True): | ||
logger.info( | ||
"OpenLLaMA models do not support the fast tokenizer. " | ||
"Using the slow tokenizer instead.") | ||
elif config.model_type == "llama" and kwargs.get("use_fast", True): | ||
# LLaMA fast tokenizer causes protobuf errors in some environments. | ||
# However, we found that the below LLaMA fast tokenizer works well in | ||
# most environments. | ||
model_name = "hf-internal-testing/llama-tokenizer" | ||
logger.info( | ||
f"Using the LLaMA fast tokenizer in '{model_name}' to avoid " | ||
"potential protobuf errors.") | ||
elif config.model_type in _MODEL_TYPES_WITH_SLOW_TOKENIZER: | ||
if kwargs.get("use_fast", False) == True: | ||
raise ValueError( | ||
f"Cannot use the fast tokenizer for {config.model_type} due to " | ||
"bugs in the fast tokenizer.") | ||
logger.info( | ||
f"Using the slow tokenizer for {config.model_type} due to bugs in " | ||
"the fast tokenizer. This could potentially lead to performance " | ||
"degradation.") | ||
kwargs["use_fast"] = False | ||
return AutoTokenizer.from_pretrained(model_name, *args, **kwargs) | ||
"For some LLaMA-based models, initializing the fast tokenizer may " | ||
"take a long time. To eliminate the initialization time, consider " | ||
f"using '{_FAST_LLAMA_TOKENIZER}' instead of the original " | ||
"tokenizer.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After this PR, do we need to manually specify llama to use the fast tokenizer for benchmarking?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends. Actually, LLaMA fast tokenizers in lmsys/vicuna-7b-v1.3
or huggyllama/llama-7b
work in my docker environment. So hf-internal-testing/llama-tokenizer
is not needed when I use vLLM in my docker environment.
wow is cool,bro you are so cool |
THANKS VERY MUCH! |
SUMMARY: * only run 4 x a10 tests for python 3.10.12 NOTE: AWS looks to be having availability issues with these instances. i'm day to day with this repo being migrated to GCP, so in the meantime let's reduce demand. TEST PLAN: n/a Co-authored-by: andy-neuma <andy@neuralmagic.com>
Fixes #111 #246 #259 #270 #281
This PR adds
tokenizer
to the input/cli arguments. If it isNone
, vLLM uses the model name/path as the tokenizer name/path. In addition, from this PR, vLLM does not usehf-internal-testing/llama-tokenizer
as the default tokenizer for llama models.