Skip to content

Conversation

kzawora-intel
Copy link

RuntimeErrors are not observed anymore on habana_main when disable_tensor_cache is used. This PR enables disable_tensor_cache.

@kzawora-intel kzawora-intel merged commit 4052bdb into habana_main Sep 10, 2024
13 checks passed
michalkuligowski added a commit that referenced this pull request Sep 17, 2024
After #252, HPUGraph capture
takes much less memory, and we can reduce the memory reserved for
HPUGraphs. On Llama3.1-8b-Instruct (G2), capturing 100% of prefill and
decode graphs on BS=256 now takes 1.566 GB of HBM, which is far less
than 40% (~30 GB) we reserve by default. This results in lots of unused
(==wasted) memory, which could be used instead for more KV cache blocks.
@kzawora-intel kzawora-intel added the habana Issues or PRs submitted by Habana Labs label Sep 20, 2024
@kzawora-intel kzawora-intel deleted the private/kzawora/disable_tensor_cache branch October 7, 2024 12:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
habana Issues or PRs submitted by Habana Labs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants