We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when max_new_token is very big,kv cache can be seriously wasteful
max_new_token=8192
max_new_token does not affect the actual memory utilization
The text was updated successfully, but these errors were encountered:
pagedattention is used in tgi but no supporting work is done, so kv cache memory will be seriously wasted
Sorry, something went wrong.
Are there plans to change this? Or is there any information I can refer to if I transform first
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
No branches or pull requests
System Info
when max_new_token is very big,kv cache can be seriously wasteful
Information
Tasks
Reproduction
max_new_token=8192
Expected behavior
max_new_token does not affect the actual memory utilization
The text was updated successfully, but these errors were encountered: