Skip to content

Conversation

@jnp
Copy link

@jnp jnp commented Jul 24, 2023

This patch fixes the issue described in #4027

In summary the llama2 model uses grouped query attention therefore it has a new configuration for key value heads in addition to num_heads.
The configurations for llama2-13b-chat model is as follows:
{
"_name_or_path": null,
"architectures": [
"LlamaForCausalLM"
],
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 2048,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pad_token_id": 0,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.31.0.dev0",
"use_cache": true,
"vocab_size": 32000
}

@mrwyattii
Copy link
Contributor

Thank you for this contribution, but we have another PR to fix this: #4022

Closing this in favor of the previously submitted fix.

@mrwyattii mrwyattii closed this Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants