Add support for Arcee AI's upcoming AFM model #14185

bartowski1182 · 2025-06-14T21:52:49Z

This adds support for upcoming Arcee model architecture, currently codenamed the Arcee Foundation Model (AFM).

Uses ReLU² (ReLU-squared) activation in the MLP blocks

Have tested performance of quantized model, seems to perform as expected, but keeping this a draft until it can be lightly reviewed and we confirm it's accurate

Transformers PR reference: huggingface/transformers#38621

ngxson

LGTM!

src/llama-model.cpp

ngxson · 2025-06-15T21:54:59Z

src/llama-model.cpp

+                    NULL,                      NULL, NULL,
+                    model.layers[il].ffn_down, NULL, NULL,
+                    NULL,
+                    LLM_FFN_RELU_SQR, LLM_FFN_SEQ, il);


Seems like the only different from AFM and llama is only this activation function.

Not sure if in the future, we can abstract out this activation definition per-model (maybe as a hparam or a variable inside struct llm_build_llama?) to avoid too much duplicated code. WDYT @ggerganov ?

It also lacks the FFN gate, but maybe could also be abstracted?

if the gate is not present, its value will be nullptr, and build_ffn will skip the nullptr value, so no further modification is required.

ah right that makes sense ! yeah definitely seems worth considering some extra abstraction here then

btw I'm just bring this up for further discussion. Feel free to merge the current PR without that

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

ngxson

Lmk when you're ready to merge this

bartowski1182 · 2025-06-15T22:31:54Z

Ready!

CISC · 2025-06-16T07:28:06Z

convert_hf_to_gguf_update.py

@@ -128,6 +128,7 @@ class TOKENIZER_TYPE(IntEnum):
    {"name": "llama4",           "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct", },
    {"name": "pixtral",          "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mistral-community/pixtral-12b", },
    {"name": "seed-coder",       "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/ByteDance-Seed/Seed-Coder-8B-Base", },
+    {"name": "arcee",            "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/arcee-ai/AFM-4.5B", }, # TODO confirm final URL


Either this shouldn't have been added, or you forgot to add the new hash.

addressed in #14207

bartowski1182 added 2 commits June 13, 2025 17:53

Add Arcee AFM support

506d215

Add draft update code

f3b1e0f

github-actions bot added the python python script changes label Jun 14, 2025

Fix linter and update URL, may still not be final

68fa44b

ngxson approved these changes Jun 15, 2025

View reviewed changes

Update src/llama-model.cpp

9730b40

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>

ngxson marked this pull request as ready for review June 15, 2025 22:14

ngxson marked this pull request as draft June 15, 2025 22:14

bartowski1182 added 2 commits June 15, 2025 23:17

Merge branch 'master' into arcee/afm

aa3c988

Remote accidental blank line

b2638a2

bartowski1182 marked this pull request as ready for review June 15, 2025 22:20

ngxson approved these changes Jun 15, 2025

View reviewed changes

ngxson merged commit d7da8dc into ggml-org:master Jun 15, 2025
50 checks passed

CISC reviewed Jun 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for Arcee AI's upcoming AFM model #14185

Add support for Arcee AI's upcoming AFM model #14185

bartowski1182 commented Jun 14, 2025 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

Uh oh!

ngxson Jun 15, 2025

Uh oh!

bartowski1182 Jun 15, 2025

Uh oh!

ngxson Jun 15, 2025

Uh oh!

bartowski1182 Jun 15, 2025

Uh oh!

ngxson Jun 15, 2025

Uh oh!

ngxson left a comment

Uh oh!

bartowski1182 commented Jun 15, 2025

Uh oh!

Uh oh!

CISC Jun 16, 2025

Uh oh!

bartowski1182 Jun 16, 2025

Uh oh!

Uh oh!

Add support for Arcee AI's upcoming AFM model #14185

Add support for Arcee AI's upcoming AFM model #14185

Conversation

bartowski1182 commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ngxson Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

bartowski1182 Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

bartowski1182 Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

bartowski1182 commented Jun 15, 2025

Uh oh!

Uh oh!

CISC Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

bartowski1182 Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bartowski1182 commented Jun 14, 2025 •

edited

Loading