Skip to content

Conversation

@quic-mamta
Copy link
Contributor

@quic-mamta quic-mamta commented Nov 13, 2025

  1. Remove inheriting transformers cache classes
  2. Restructure Cache in Layers for HybridCache, HybridChunkedCache and HybridCacheForGPTOSS
  3. Removed redundancies from QEFFHybridCache and QEFFHybridChunkedCache, It should improve perf for models that use sliding window like gemma, mistral etc.

mamtsing and others added 3 commits November 5, 2025 15:21
Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
…ybridCacheForGPTOSS

Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
quic-mamta and others added 2 commits November 17, 2025 11:12
Signed-off-by: Mamta Singh <168400541+quic-mamta@users.noreply.github.com>
Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
@quic-rishinr
Copy link
Contributor

@ochougul Can you please review this PR

Signed-off-by: Mamta Singh <mamtsing@qti.qualcomm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants