Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for XGLM models #5097

Closed
4 tasks done
Stypox opened this issue Jan 23, 2024 · 2 comments
Closed
4 tasks done

Support for XGLM models #5097

Stypox opened this issue Jan 23, 2024 · 2 comments
Labels
enhancement New feature or request stale

Comments

@Stypox
Copy link

Stypox commented Jan 23, 2024

Prerequisites

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed). -- I searched for xglm but couldn't find any mention at all in the whole project.
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Add support for facebook's XGLM models (e.g. xglm-564M), including converting them to gguf and then running them with llama.cpp. The HuggingFace implementation docs are here, while the models specs are here, for reference.

Motivation

XGLM models have good performance on specific tasks, despite their size. For example, the 564M model is small enough to run on any device, and can understand quite well the context of a sentence and extract specific parts (a useful use-case for an assistant). I could run xglm-564M using HuggingFace's framework from within termux, however it doesn't support (efficient) quantization on CPU, and so the model ends up using 3 GB of RAM and being quite slow. Also, I would like to embed an XGLM model in an Android app, and doing so with llama.cpp would be much simpler (and more efficient) than packaging python, transformers and all other dependencies of the HuggingFace implementation.

Possible Implementation

The convert-hf-to-gguf.py script should add support for XGLMForCausalLM and probably something more needs to be implemented in llama.cpp if some new layer type needs to be implemented. I would be open to help with implementation, however I don't know much about LLM architecture in general, and about the llama.cpp project specifically. I looked at one recently merged PR that added support for a new model structure, but couldn't really understand what was going on. Do you have some documentation on how to add new model types? Would it be simple to add XGLM or does it have a nonstandard architecture?

These are the layers in the XGLM-564M model:
model.embed_tokens.weight
model.layers.0.self_attn.k_proj.weight
model.layers.0.self_attn.k_proj.bias
model.layers.0.self_attn.v_proj.weight
model.layers.0.self_attn.v_proj.bias
model.layers.0.self_attn.q_proj.weight
model.layers.0.self_attn.q_proj.bias
model.layers.0.self_attn.out_proj.weight
model.layers.0.self_attn.out_proj.bias
model.layers.0.self_attn_layer_norm.weight
model.layers.0.self_attn_layer_norm.bias
model.layers.0.fc1.weight
model.layers.0.fc1.bias
model.layers.0.fc2.weight
model.layers.0.fc2.bias
model.layers.0.final_layer_norm.weight
model.layers.0.final_layer_norm.bias
model.layers.1.self_attn.k_proj.weight
model.layers.1.self_attn.k_proj.bias
model.layers.1.self_attn.v_proj.weight
model.layers.1.self_attn.v_proj.bias
model.layers.1.self_attn.q_proj.weight
model.layers.1.self_attn.q_proj.bias
model.layers.1.self_attn.out_proj.weight
model.layers.1.self_attn.out_proj.bias
model.layers.1.self_attn_layer_norm.weight
model.layers.1.self_attn_layer_norm.bias
model.layers.1.fc1.weight
model.layers.1.fc1.bias
model.layers.1.fc2.weight
model.layers.1.fc2.bias
model.layers.1.final_layer_norm.weight
model.layers.1.final_layer_norm.bias
model.layers.2.self_attn.k_proj.weight
model.layers.2.self_attn.k_proj.bias
model.layers.2.self_attn.v_proj.weight
model.layers.2.self_attn.v_proj.bias
model.layers.2.self_attn.q_proj.weight
model.layers.2.self_attn.q_proj.bias
model.layers.2.self_attn.out_proj.weight
model.layers.2.self_attn.out_proj.bias
model.layers.2.self_attn_layer_norm.weight
model.layers.2.self_attn_layer_norm.bias
model.layers.2.fc1.weight
model.layers.2.fc1.bias
model.layers.2.fc2.weight
model.layers.2.fc2.bias
model.layers.2.final_layer_norm.weight
model.layers.2.final_layer_norm.bias
model.layers.3.self_attn.k_proj.weight
model.layers.3.self_attn.k_proj.bias
model.layers.3.self_attn.v_proj.weight
model.layers.3.self_attn.v_proj.bias
model.layers.3.self_attn.q_proj.weight
model.layers.3.self_attn.q_proj.bias
model.layers.3.self_attn.out_proj.weight
model.layers.3.self_attn.out_proj.bias
model.layers.3.self_attn_layer_norm.weight
model.layers.3.self_attn_layer_norm.bias
model.layers.3.fc1.weight
model.layers.3.fc1.bias
model.layers.3.fc2.weight
model.layers.3.fc2.bias
model.layers.3.final_layer_norm.weight
model.layers.3.final_layer_norm.bias
model.layers.4.self_attn.k_proj.weight
model.layers.4.self_attn.k_proj.bias
model.layers.4.self_attn.v_proj.weight
model.layers.4.self_attn.v_proj.bias
model.layers.4.self_attn.q_proj.weight
model.layers.4.self_attn.q_proj.bias
model.layers.4.self_attn.out_proj.weight
model.layers.4.self_attn.out_proj.bias
model.layers.4.self_attn_layer_norm.weight
model.layers.4.self_attn_layer_norm.bias
model.layers.4.fc1.weight
model.layers.4.fc1.bias
model.layers.4.fc2.weight
model.layers.4.fc2.bias
model.layers.4.final_layer_norm.weight
model.layers.4.final_layer_norm.bias
model.layers.5.self_attn.k_proj.weight
model.layers.5.self_attn.k_proj.bias
model.layers.5.self_attn.v_proj.weight
model.layers.5.self_attn.v_proj.bias
model.layers.5.self_attn.q_proj.weight
model.layers.5.self_attn.q_proj.bias
model.layers.5.self_attn.out_proj.weight
model.layers.5.self_attn.out_proj.bias
model.layers.5.self_attn_layer_norm.weight
model.layers.5.self_attn_layer_norm.bias
model.layers.5.fc1.weight
model.layers.5.fc1.bias
model.layers.5.fc2.weight
model.layers.5.fc2.bias
model.layers.5.final_layer_norm.weight
model.layers.5.final_layer_norm.bias
model.layers.6.self_attn.k_proj.weight
model.layers.6.self_attn.k_proj.bias
model.layers.6.self_attn.v_proj.weight
model.layers.6.self_attn.v_proj.bias
model.layers.6.self_attn.q_proj.weight
model.layers.6.self_attn.q_proj.bias
model.layers.6.self_attn.out_proj.weight
model.layers.6.self_attn.out_proj.bias
model.layers.6.self_attn_layer_norm.weight
model.layers.6.self_attn_layer_norm.bias
model.layers.6.fc1.weight
model.layers.6.fc1.bias
model.layers.6.fc2.weight
model.layers.6.fc2.bias
model.layers.6.final_layer_norm.weight
model.layers.6.final_layer_norm.bias
model.layers.7.self_attn.k_proj.weight
model.layers.7.self_attn.k_proj.bias
model.layers.7.self_attn.v_proj.weight
model.layers.7.self_attn.v_proj.bias
model.layers.7.self_attn.q_proj.weight
model.layers.7.self_attn.q_proj.bias
model.layers.7.self_attn.out_proj.weight
model.layers.7.self_attn.out_proj.bias
model.layers.7.self_attn_layer_norm.weight
model.layers.7.self_attn_layer_norm.bias
model.layers.7.fc1.weight
model.layers.7.fc1.bias
model.layers.7.fc2.weight
model.layers.7.fc2.bias
model.layers.7.final_layer_norm.weight
model.layers.7.final_layer_norm.bias
model.layers.8.self_attn.k_proj.weight
model.layers.8.self_attn.k_proj.bias
model.layers.8.self_attn.v_proj.weight
model.layers.8.self_attn.v_proj.bias
model.layers.8.self_attn.q_proj.weight
model.layers.8.self_attn.q_proj.bias
model.layers.8.self_attn.out_proj.weight
model.layers.8.self_attn.out_proj.bias
model.layers.8.self_attn_layer_norm.weight
model.layers.8.self_attn_layer_norm.bias
model.layers.8.fc1.weight
model.layers.8.fc1.bias
model.layers.8.fc2.weight
model.layers.8.fc2.bias
model.layers.8.final_layer_norm.weight
model.layers.8.final_layer_norm.bias
model.layers.9.self_attn.k_proj.weight
model.layers.9.self_attn.k_proj.bias
model.layers.9.self_attn.v_proj.weight
model.layers.9.self_attn.v_proj.bias
model.layers.9.self_attn.q_proj.weight
model.layers.9.self_attn.q_proj.bias
model.layers.9.self_attn.out_proj.weight
model.layers.9.self_attn.out_proj.bias
model.layers.9.self_attn_layer_norm.weight
model.layers.9.self_attn_layer_norm.bias
model.layers.9.fc1.weight
model.layers.9.fc1.bias
model.layers.9.fc2.weight
model.layers.9.fc2.bias
model.layers.9.final_layer_norm.weight
model.layers.9.final_layer_norm.bias
model.layers.10.self_attn.k_proj.weight
model.layers.10.self_attn.k_proj.bias
model.layers.10.self_attn.v_proj.weight
model.layers.10.self_attn.v_proj.bias
model.layers.10.self_attn.q_proj.weight
model.layers.10.self_attn.q_proj.bias
model.layers.10.self_attn.out_proj.weight
model.layers.10.self_attn.out_proj.bias
model.layers.10.self_attn_layer_norm.weight
model.layers.10.self_attn_layer_norm.bias
model.layers.10.fc1.weight
model.layers.10.fc1.bias
model.layers.10.fc2.weight
model.layers.10.fc2.bias
model.layers.10.final_layer_norm.weight
model.layers.10.final_layer_norm.bias
model.layers.11.self_attn.k_proj.weight
model.layers.11.self_attn.k_proj.bias
model.layers.11.self_attn.v_proj.weight
model.layers.11.self_attn.v_proj.bias
model.layers.11.self_attn.q_proj.weight
model.layers.11.self_attn.q_proj.bias
model.layers.11.self_attn.out_proj.weight
model.layers.11.self_attn.out_proj.bias
model.layers.11.self_attn_layer_norm.weight
model.layers.11.self_attn_layer_norm.bias
model.layers.11.fc1.weight
model.layers.11.fc1.bias
model.layers.11.fc2.weight
model.layers.11.fc2.bias
model.layers.11.final_layer_norm.weight
model.layers.11.final_layer_norm.bias
model.layers.12.self_attn.k_proj.weight
model.layers.12.self_attn.k_proj.bias
model.layers.12.self_attn.v_proj.weight
model.layers.12.self_attn.v_proj.bias
model.layers.12.self_attn.q_proj.weight
model.layers.12.self_attn.q_proj.bias
model.layers.12.self_attn.out_proj.weight
model.layers.12.self_attn.out_proj.bias
model.layers.12.self_attn_layer_norm.weight
model.layers.12.self_attn_layer_norm.bias
model.layers.12.fc1.weight
model.layers.12.fc1.bias
model.layers.12.fc2.weight
model.layers.12.fc2.bias
model.layers.12.final_layer_norm.weight
model.layers.12.final_layer_norm.bias
model.layers.13.self_attn.k_proj.weight
model.layers.13.self_attn.k_proj.bias
model.layers.13.self_attn.v_proj.weight
model.layers.13.self_attn.v_proj.bias
model.layers.13.self_attn.q_proj.weight
model.layers.13.self_attn.q_proj.bias
model.layers.13.self_attn.out_proj.weight
model.layers.13.self_attn.out_proj.bias
model.layers.13.self_attn_layer_norm.weight
model.layers.13.self_attn_layer_norm.bias
model.layers.13.fc1.weight
model.layers.13.fc1.bias
model.layers.13.fc2.weight
model.layers.13.fc2.bias
model.layers.13.final_layer_norm.weight
model.layers.13.final_layer_norm.bias
model.layers.14.self_attn.k_proj.weight
model.layers.14.self_attn.k_proj.bias
model.layers.14.self_attn.v_proj.weight
model.layers.14.self_attn.v_proj.bias
model.layers.14.self_attn.q_proj.weight
model.layers.14.self_attn.q_proj.bias
model.layers.14.self_attn.out_proj.weight
model.layers.14.self_attn.out_proj.bias
model.layers.14.self_attn_layer_norm.weight
model.layers.14.self_attn_layer_norm.bias
model.layers.14.fc1.weight
model.layers.14.fc1.bias
model.layers.14.fc2.weight
model.layers.14.fc2.bias
model.layers.14.final_layer_norm.weight
model.layers.14.final_layer_norm.bias
model.layers.15.self_attn.k_proj.weight
model.layers.15.self_attn.k_proj.bias
model.layers.15.self_attn.v_proj.weight
model.layers.15.self_attn.v_proj.bias
model.layers.15.self_attn.q_proj.weight
model.layers.15.self_attn.q_proj.bias
model.layers.15.self_attn.out_proj.weight
model.layers.15.self_attn.out_proj.bias
model.layers.15.self_attn_layer_norm.weight
model.layers.15.self_attn_layer_norm.bias
model.layers.15.fc1.weight
model.layers.15.fc1.bias
model.layers.15.fc2.weight
model.layers.15.fc2.bias
model.layers.15.final_layer_norm.weight
model.layers.15.final_layer_norm.bias
model.layers.16.self_attn.k_proj.weight
model.layers.16.self_attn.k_proj.bias
model.layers.16.self_attn.v_proj.weight
model.layers.16.self_attn.v_proj.bias
model.layers.16.self_attn.q_proj.weight
model.layers.16.self_attn.q_proj.bias
model.layers.16.self_attn.out_proj.weight
model.layers.16.self_attn.out_proj.bias
model.layers.16.self_attn_layer_norm.weight
model.layers.16.self_attn_layer_norm.bias
model.layers.16.fc1.weight
model.layers.16.fc1.bias
model.layers.16.fc2.weight
model.layers.16.fc2.bias
model.layers.16.final_layer_norm.weight
model.layers.16.final_layer_norm.bias
model.layers.17.self_attn.k_proj.weight
model.layers.17.self_attn.k_proj.bias
model.layers.17.self_attn.v_proj.weight
model.layers.17.self_attn.v_proj.bias
model.layers.17.self_attn.q_proj.weight
model.layers.17.self_attn.q_proj.bias
model.layers.17.self_attn.out_proj.weight
model.layers.17.self_attn.out_proj.bias
model.layers.17.self_attn_layer_norm.weight
model.layers.17.self_attn_layer_norm.bias
model.layers.17.fc1.weight
model.layers.17.fc1.bias
model.layers.17.fc2.weight
model.layers.17.fc2.bias
model.layers.17.final_layer_norm.weight
model.layers.17.final_layer_norm.bias
model.layers.18.self_attn.k_proj.weight
model.layers.18.self_attn.k_proj.bias
model.layers.18.self_attn.v_proj.weight
model.layers.18.self_attn.v_proj.bias
model.layers.18.self_attn.q_proj.weight
model.layers.18.self_attn.q_proj.bias
model.layers.18.self_attn.out_proj.weight
model.layers.18.self_attn.out_proj.bias
model.layers.18.self_attn_layer_norm.weight
model.layers.18.self_attn_layer_norm.bias
model.layers.18.fc1.weight
model.layers.18.fc1.bias
model.layers.18.fc2.weight
model.layers.18.fc2.bias
model.layers.18.final_layer_norm.weight
model.layers.18.final_layer_norm.bias
model.layers.19.self_attn.k_proj.weight
model.layers.19.self_attn.k_proj.bias
model.layers.19.self_attn.v_proj.weight
model.layers.19.self_attn.v_proj.bias
model.layers.19.self_attn.q_proj.weight
model.layers.19.self_attn.q_proj.bias
model.layers.19.self_attn.out_proj.weight
model.layers.19.self_attn.out_proj.bias
model.layers.19.self_attn_layer_norm.weight
model.layers.19.self_attn_layer_norm.bias
model.layers.19.fc1.weight
model.layers.19.fc1.bias
model.layers.19.fc2.weight
model.layers.19.fc2.bias
model.layers.19.final_layer_norm.weight
model.layers.19.final_layer_norm.bias
model.layers.20.self_attn.k_proj.weight
model.layers.20.self_attn.k_proj.bias
model.layers.20.self_attn.v_proj.weight
model.layers.20.self_attn.v_proj.bias
model.layers.20.self_attn.q_proj.weight
model.layers.20.self_attn.q_proj.bias
model.layers.20.self_attn.out_proj.weight
model.layers.20.self_attn.out_proj.bias
model.layers.20.self_attn_layer_norm.weight
model.layers.20.self_attn_layer_norm.bias
model.layers.20.fc1.weight
model.layers.20.fc1.bias
model.layers.20.fc2.weight
model.layers.20.fc2.bias
model.layers.20.final_layer_norm.weight
model.layers.20.final_layer_norm.bias
model.layers.21.self_attn.k_proj.weight
model.layers.21.self_attn.k_proj.bias
model.layers.21.self_attn.v_proj.weight
model.layers.21.self_attn.v_proj.bias
model.layers.21.self_attn.q_proj.weight
model.layers.21.self_attn.q_proj.bias
model.layers.21.self_attn.out_proj.weight
model.layers.21.self_attn.out_proj.bias
model.layers.21.self_attn_layer_norm.weight
model.layers.21.self_attn_layer_norm.bias
model.layers.21.fc1.weight
model.layers.21.fc1.bias
model.layers.21.fc2.weight
model.layers.21.fc2.bias
model.layers.21.final_layer_norm.weight
model.layers.21.final_layer_norm.bias
model.layers.22.self_attn.k_proj.weight
model.layers.22.self_attn.k_proj.bias
model.layers.22.self_attn.v_proj.weight
model.layers.22.self_attn.v_proj.bias
model.layers.22.self_attn.q_proj.weight
model.layers.22.self_attn.q_proj.bias
model.layers.22.self_attn.out_proj.weight
model.layers.22.self_attn.out_proj.bias
model.layers.22.self_attn_layer_norm.weight
model.layers.22.self_attn_layer_norm.bias
model.layers.22.fc1.weight
model.layers.22.fc1.bias
model.layers.22.fc2.weight
model.layers.22.fc2.bias
model.layers.22.final_layer_norm.weight
model.layers.22.final_layer_norm.bias
model.layers.23.self_attn.k_proj.weight
model.layers.23.self_attn.k_proj.bias
model.layers.23.self_attn.v_proj.weight
model.layers.23.self_attn.v_proj.bias
model.layers.23.self_attn.q_proj.weight
model.layers.23.self_attn.q_proj.bias
model.layers.23.self_attn.out_proj.weight
model.layers.23.self_attn.out_proj.bias
model.layers.23.self_attn_layer_norm.weight
model.layers.23.self_attn_layer_norm.bias
model.layers.23.fc1.weight
model.layers.23.fc1.bias
model.layers.23.fc2.weight
model.layers.23.fc2.bias
model.layers.23.final_layer_norm.weight
model.layers.23.final_layer_norm.bias
model.layer_norm.weight
model.layer_norm.bias
lm_head.weight
This is the config.json file for the 564M model (taken from here):
{
  "activation_dropout": 0,
  "activation_function": "gelu",
  "architectures": [
    "XGLMForCausalLM"
  ],
  "attention_dropout": 0.1,
  "attention_heads": 16,
  "bos_token_id": 0,
  "d_model": 1024,
  "decoder_start_token_id": 2,
  "dropout": 0.1,
  "eos_token_id": 2,
  "ffn_dim": 4096,
  "init_std": 0.02,
  "layerdrop": 0.0,
  "max_position_embeddings": 2048,
  "model_type": "xglm",
  "num_layers": 24,
  "pad_token_id": 1,
  "scale_embedding": true,
  "transformers_version": "4.16.0.dev0",
  "use_cache": true,
  "vocab_size": 256008
}
@Stypox Stypox added the enhancement New feature or request label Jan 23, 2024
Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

1 participant