Skip to content

Conversation

@nikolaystanishev
Copy link
Contributor

@nikolaystanishev nikolaystanishev commented Jul 25, 2025

Description

  • When patch key or value model heads n_key_value_heads is used instead of n_heads.
  • When stacking the results for them in the attention head patching methods their results are padded due to different dimentions.

The problem is described in the corresponding issue.

Fixes #980

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@jlarson4 jlarson4 changed the base branch from main to dev January 20, 2026 14:46
@jlarson4 jlarson4 merged commit e9e7448 into TransformerLensOrg:dev Jan 20, 2026
13 checks passed
jlarson4 added a commit that referenced this pull request Feb 3, 2026
Cherry-picked from v2.17.0 commit e9e7448

- Use n_key_value_heads instead of n_heads for k/v activations when available
- Pad k and v results to match q results in get_act_patch_attn_head_*_every functions
- Updated docstrings to reflect shape variations for GQA models

This fixes patching for models with Grouped Query Attention (GQA) where
n_heads != n_key_value_heads (e.g., Gemma 3, Llama 3.1, etc.)

Original commit: e9e7448 Fix key and value heads patching for models with different n_heads from n_key_value_heads (#981)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug Report] Error when patching key or value heads

3 participants