[AWQ] allow for use of model-wide kwargs cache #1985

brian-dellabetta · 2025-10-30T23:12:13Z

SUMMARY:
It appears AutoAWQ makes an implicit assumption that, when caching inputs to the forward call of certain modules, only the first input is needed, and all other kwargs can be shared, so that they don't have to be redundantly stored in GPU VRAM.

When we first ported AutoAWQ, we thought this seemed incorrect and could lead to poor behavior. Our implementation cached all the args into a given module's forward call, so that it is guaranteed to be replicated correctly at the expense of GPU VRAM.

Now that we are revisiting performance improvements for AWQ, I wanted to add AutoAWQ's design choice as a toggleable field on AWQModifier.

If AWQModifier(..., use_auto_awq_mem_hack=True), we will use AutoAWQ's technique to cache to a field
- _model_kwargs_cache: IntermediatesCache
Otherwise, we will cache to a field
- _parent_kwargs_cache: dict[Module, IntermediatesCache]

I am pretty sure that's what this PR has, but when I compare the running of examples/awq/qwen3_moe_example.py side-by-side with the field False or True, I don't see any meaningful difference in the VRAM usage. I need to do some more debugging to make sure this is working as intended (and compare to VRAM usage when running AutoAWQ)

TEST PLAN:
"please outline how the changes were tested"

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

github-actions · 2025-10-30T23:12:24Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta added 2 commits October 30, 2025 21:10

p1

61fafc1

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

seemingly working with flag

5b57be2

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

brian-dellabetta requested review from HDCharles and kylesayrs October 30, 2025 23:12

comment

f115d8f

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AWQ] allow for use of model-wide kwargs cache #1985

[AWQ] allow for use of model-wide kwargs cache #1985

Uh oh!

brian-dellabetta commented Oct 30, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[AWQ] allow for use of model-wide kwargs cache #1985

Are you sure you want to change the base?

[AWQ] allow for use of model-wide kwargs cache #1985

Uh oh!

Conversation

brian-dellabetta commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

brian-dellabetta commented Oct 30, 2025 •

edited

Loading