Fix key and value heads patching for models with different n_heads from n_key_value_heads #981

nikolaystanishev · 2025-07-25T11:48:17Z

Description

When patch key or value model heads n_key_value_heads is used instead of n_heads.
When stacking the results for them in the attention head patching methods their results are padded due to different dimentions.

The problem is described in the corresponding issue.

Fixes #980

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Release 2.16

Release 2.16.1

Release v2.16.2

…odel

Cherry-picked from v2.17.0 commit e9e7448 - Use n_key_value_heads instead of n_heads for k/v activations when available - Pad k and v results to match q results in get_act_patch_attn_head_*_every functions - Updated docstrings to reflect shape variations for GQA models This fixes patching for models with Grouped Query Attention (GQA) where n_heads != n_key_value_heads (e.g., Gemma 3, Llama 3.1, etc.) Original commit: e9e7448 Fix key and value heads patching for models with different n_heads from n_key_value_heads (#981)

bryce13950 and others added 5 commits June 12, 2025 11:19

Merge pull request TransformerLensOrg#945 from TransformerLensOrg/dev

e1c7506

Release 2.16

Merge pull request TransformerLensOrg#952 from TransformerLensOrg/dev

a634e57

Release 2.16.1

Merge pull request TransformerLensOrg#958 from TransformerLensOrg/dev

50ee38b

Release v2.16.2

Fix the case where n_head and n_key_value_heads are different for a m…

49b973f

…odel

Update doc string

f5efadf

nikolaystanishev mentioned this pull request Jul 25, 2025

[Bug Report] Error when patching key or value heads #980

Open

1 task

jlarson4 changed the base branch from main to dev January 20, 2026 14:46

jlarson4 merged commit e9e7448 into TransformerLensOrg:dev Jan 20, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix key and value heads patching for models with different n_heads from n_key_value_heads #981

Fix key and value heads patching for models with different n_heads from n_key_value_heads #981

Uh oh!

nikolaystanishev commented Jul 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix key and value heads patching for models with different n_heads from n_key_value_heads #981

Fix key and value heads patching for models with different n_heads from n_key_value_heads #981

Uh oh!

Conversation

nikolaystanishev commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Screenshots

Checklist:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikolaystanishev commented Jul 25, 2025 •

edited

Loading