New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support attention_bias on LLaMA architecture #4283

Merged

ggerganov merged 3 commits into ggerganov:master from CausalLM:llama_attention_bias

Dec 1, 2023

Commits on Dec 1, 2023

Support attention_bias on LLaMA architecture
```
QKVO bias, should fix InternLM (ggerganov#3133) and works for LLaMAfied Qwen models (ggerganov#3743 (comment)).
```
CausalLM authored Dec 1, 2023
Configuration menu
View commit details

Copy full SHA for c48679a

Browse repository at this point
Copy the full SHA

c48679a View commit details

Browse the repository at this point in the history
check existence of qkvo bias while loading llama models
```
Tested on LLaMA2, CUDA and CPU.
```
CausalLM authored Dec 1, 2023
Configuration menu
View commit details

Copy full SHA for e192572

Browse repository at this point
Copy the full SHA

e192572 View commit details

Browse the repository at this point in the history
Update llama.cpp

CausalLM authored Dec 1, 2023
Configuration menu
View commit details

Copy full SHA for b1efaed

Browse repository at this point
Copy the full SHA

b1efaed View commit details

Browse the repository at this point in the history