Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[transformer] support multi query attention && multi goruped #2403

Merged
merged 4 commits into from
Mar 11, 2024

Conversation

Mddct
Copy link
Collaborator

@Mddct Mddct commented Mar 11, 2024

muitl query 可以减少cache占用,加快解码速度, llm中slm中会用到 比如gemma等
并且根据paper https://github.com/wenet-e2e/wenet/pull/2363#issuecomment-1961189853, 我们可以在multihead 训练的模型基础上, 用MQA 继续训练

ASR上 https://github.com/wenet-e2e/wenet/pull/2363#issuecomment-1961189853, 中显示 确实可以和multihead 达到一致的性能

wenet/transformer/attention.py Outdated Show resolved Hide resolved
xingchensong
xingchensong previously approved these changes Mar 11, 2024
wenet/transformer/attention.py Outdated Show resolved Hide resolved
@xingchensong xingchensong merged commit 35e0a1c into main Mar 11, 2024
5 of 6 checks passed
@xingchensong xingchensong deleted the Mddct-mqa-mga branch March 11, 2024 13:01
srdfjy pushed a commit to srdfjy/wenet that referenced this pull request Oct 8, 2024
…2e#2403)

* [transformer] support multi query attention

* fix dim

* fix dim

* fix comment and fix kv_head
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants