-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Toggling KV-caches #1763
Toggling KV-caches #1763
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchtune/1763
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7906807 with merge base 3ca0d30 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1763 +/- ##
===========================================
- Coverage 69.33% 25.70% -43.63%
===========================================
Files 305 305
Lines 15892 16003 +111
===========================================
- Hits 11018 4113 -6905
- Misses 4874 11890 +7016 ☔ View full report in Codecov by Sentry. |
I've actually just realized this isn't going to play nice with fusion models and we may have to pollute |
torchtune/modules/common_utils.py
Outdated
|
||
|
||
@contextlib.contextmanager | ||
def setup_use_local_kv_cache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't like this name (I also don't have a great suggestion that isn't extremely long). I do think we should change local
-> temporary
, and I would like to say "remove setup from the name". I'm on the fence about that not being explicit enough, but I kinda think temporary implies that we are constructing and destroying as part of the context manager. What do you think?
@@ -199,12 +199,3 @@ def test_eval_recipe_errors_with_qat_quantizer(self, capsys, monkeypatch, tmpdir | |||
match="QAT quantizers should only be used during quantization aware training", | |||
): | |||
runpy.run_path(TUNE_PATH, run_name="__main__") | |||
|
|||
@pytest.mark.integration_test | |||
def test_eval_recipe_errors_with_generate_until_and_mc_tasks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we instead have a test that this now actually works?
I think I see where you're coming from here. I think what would help disambiguate is having I think going forward, and to significantly simplify things, I'd propose:
This means
|
Overall this sounds pretty reasonable to me. My main question is around (3). If we just locally disable KV cache, when do we reset (if at all)? Like if I setup the cache, call forward a bunch of times (with it now enabled by default), wrap a few forward calls in And then does this also mean that we now have both |
The onus is on the user here to correctly
Sorry, yes.
EDIT: |
As another side note here, we currently expect attention layers This is the same across both kinds of attention layers. We're already showing here that we don't need this by doing things like for module in model.modules():
if hasattr(module, "kv_cache") and callable(module.kv_cache):
module.cache_enabled = False
module.kv_cache = None so why don't we get rid of the intermediary functions and use the API on edit: sorry, I know I'm terrible for over-scoping PRs but I can't help it. |
You're right, I think we should be doing this but BAD SALMAN STOP OVERSCOPING PRs. Let's open a separate issue for follow-up? |
Yeah agreed, let's tackle this separately. Sorry I slept on this PR for a few days, what's the status here? Like are we ready for another pass or are there still open questions we need to sort out? |
Yeah this is good to review again. I'll fix merge conflicts/add in the eval test you requested later this week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple nits, but I'm ready to sign off on this PR.
Context
What is the purpose of this PR? Is it to
Please link to any issues this PR addresses.
closes #1621
RFC here #1675
Multimodal eval results
On main
On this branch
Text eval results
Test plan
Please make sure to do each of the following if applicable to your PR. If you're unsure about any one of these just ask and we will happily help. We also have a contributing page for some guidance on contributing.
pre-commit install
)pytest tests
pytest tests -m integration_test
UX
If your function changed a public API, please add a dummy example of what the user experience will look like when calling it.
Here is a docstring example
and a tutorial example