removed redundancies from QEFFHybridCache #582

ochougul · 2025-10-02T20:38:33Z

Should improve perf for models that use sliding window like gemma, mistral etc.

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

quic-rishinr · 2025-11-04T05:46:20Z

QEfficient/transformers/cache_utils.py

            # Original Gather
            ctx_len = self.key_cache[layer_idx].shape[2]
            ctx_indices = torch.arange(ctx_len)[None, None, ...]
-            gather_limit = kv_position_ids.max(1, keepdim=True).values.unsqueeze(1)


if we are using position_ids it would go overboard for sliding window right?

yes and it won't affect the ctx_indices, there will be no invalid values in that case

quic-mamta · 2025-11-17T06:01:45Z

These changes has been taken in #616

removed redudancies from QEFFHybridCache

3729740

Signed-off-by: Onkar Chougule <ochougul@qti.qualcomm.com>

ochougul requested review from quic-amitraj, quic-hemagnih and quic-rishinr as code owners October 2, 2025 20:38

quic-rishinr reviewed Nov 4, 2025

View reviewed changes

quic-mamta closed this Nov 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

removed redundancies from QEFFHybridCache #582

removed redundancies from QEFFHybridCache #582

Uh oh!

ochougul commented Oct 2, 2025

Uh oh!

quic-rishinr Nov 4, 2025

Uh oh!

ochougul Nov 13, 2025

Uh oh!

quic-mamta commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

removed redundancies from QEFFHybridCache #582

removed redundancies from QEFFHybridCache #582

Uh oh!

Conversation

ochougul commented Oct 2, 2025

Uh oh!

quic-rishinr Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

ochougul Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

quic-mamta commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants