enable internal kv bucket in llama #720

xt574chen · 2024-02-18T03:47:02Z

What does this PR do?

To enhance throughput in scenarios with long new tokens, break down the KV cache into multiples of the bucket width. Use this to compute attention rather than using the entire KV cache.

LLaMA v2 70B (8x, max_input_tokens 128, max_new_tokens 2048, batch_size 240):
5528 tps (original performance) -> 6378 tps (w/ internal bucket size 128)

Add --bucket_size=128 --bucket_internal to the commands to enable the feature.

HuggingFaceDocBuilderDev · 2024-02-20T07:12:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/habana/transformers/generation/configuration_utils.py

regisss · 2024-02-21T12:26:54Z

optimum/habana/transformers/generation/utils.py

+            if not generation_config.bucket_internal:
+                assert generation_config.bucket_size <= 0, "reuse_cache and bucketing flags set together"
+            else:
+                assert generation_config.bucket_size >= 0, "bucket_internal and bucket_size flags set together"


Here, we are in the case where generation_config.bucket_internal is True, so if this assert fails (i.e. generation_config.bucket_size < 0), it means that bucket_size is not set right? But the error message says otherwise

@puneeshkhanna I see you've corrected some error messages. I hope this update won't cause conflict.

@xt574chen - I will update my PR once this gets merged first

optimum/habana/transformers/generation/utils.py

regisss · 2024-02-22T15:19:32Z

@xt574chen Can you run the following from the root of the repo to make the code style check pass please?

pip install -U ruff
make style

enable internal kv bucket in llama

d8819d8

xt574chen requested review from ssarkar2, bhargaveede, vivekgoe, mandy-li and libinta as code owners February 18, 2024 03:47

xt574chen requested a review from a user February 18, 2024 03:47

xt574chen requested a review from regisss as a code owner February 18, 2024 03:47

puneeshkhanna mentioned this pull request Feb 19, 2024

Further fixes for performance with internal bucketing #723

Closed

3 tasks

fix conflict

dfef797

regisss reviewed Feb 21, 2024

View reviewed changes

add docstring and modify error message for bucket internal

098b7b0

regisss reviewed Feb 22, 2024

View reviewed changes

optimum/habana/transformers/generation/utils.py Outdated Show resolved Hide resolved

reformat code

1bc7b65

regisss added the run-test Run CI for PRs from external contributors label Feb 23, 2024

regisss approved these changes Feb 23, 2024

View reviewed changes

regisss merged commit e328e21 into huggingface:main Feb 23, 2024
11 of 12 checks passed

jychen21 pushed a commit to jychen21/optimum-habana that referenced this pull request Feb 27, 2024

Enable internal kv bucket in llama (huggingface#720)

82b6478

xt574chen deleted the bucket_internal branch March 1, 2024 01:03

HolyFalafel pushed a commit to HabanaAI/optimum-habana-fork that referenced this pull request Mar 11, 2024

Enable internal kv bucket in llama (huggingface#720)

cd7f9f5

puneeshkhanna mentioned this pull request Mar 11, 2024

Further fixes for performance with internal bucketing. #781

Merged

3 tasks

This was referenced Jun 7, 2024

enable internal kv bucket in llama HabanaAI/optimum-habana-fork#24

Merged

extend bucket_internal to SAMPLE generation mode HabanaAI/optimum-habana-fork#84

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable internal kv bucket in llama #720

enable internal kv bucket in llama #720

xt574chen commented Feb 18, 2024

HuggingFaceDocBuilderDev commented Feb 20, 2024

regisss Feb 21, 2024

xt574chen Feb 21, 2024

puneeshkhanna Feb 22, 2024

regisss commented Feb 22, 2024

enable internal kv bucket in llama #720

enable internal kv bucket in llama #720

Conversation

xt574chen commented Feb 18, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Feb 20, 2024

regisss Feb 21, 2024

Choose a reason for hiding this comment

xt574chen Feb 21, 2024

Choose a reason for hiding this comment

puneeshkhanna Feb 22, 2024

Choose a reason for hiding this comment

regisss commented Feb 22, 2024