Description
The client-side tokenization in guidellm fails to account for the extra tokens added in the server's chat prompt template. There are two possible workarounds:
- Enable usage metrics in each request and let the server tell us how many prompt tokens there are.
- Use the
/completions
endpoint rather than/chat/completions
as the chat template is not applied on the/completions
endpoint.