llama: support Qwen3 #12501

CISC · 2025-03-21T16:27:03Z

Initial draft based on huggingface/transformers#36878

In case models are released before I can have a look at them this weekend:

TODO

Set type for all layer sizes in llama_model::load_hparams
Test conversion and inference on all models

x0wllaar · 2025-03-21T16:58:56Z

Are you planning to add MoE support?

CISC · 2025-03-21T17:59:29Z

Are you planning to add MoE support?

I'm focusing on non-MoE for now, so if someone wants to work on Qwen3MoE in the mean time they are more than welcome to. :)

x0wllaar · 2025-03-21T18:21:02Z

Thank you! I not sure I'm up to the task though lol

ngxson · 2025-03-21T20:16:10Z

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE. Diff are:

Shared experts are removed
Added k_norm and q_norm (similar to qwen3 dense)

CISC · 2025-03-21T20:51:24Z

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE.

That was my initial impression too, I can have a stab at it if no-one else volunteers, just didn't want to bite off too much at once (esp. given the flustercuck 57B-A14B was). :)

CISC · 2025-04-08T20:32:32Z

Superseded by #12828

CISC added 3 commits March 21, 2025 17:20

Support Qwen3

cdae3fb

qwen3 arch

420eb3f

Support Qwen3

5deee92

github-actions bot added the python python script changes label Mar 21, 2025

CISC closed this Apr 8, 2025

CISC deleted the qwen3 branch April 8, 2025 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama: support Qwen3 #12501

llama: support Qwen3 #12501

Uh oh!

CISC commented Mar 21, 2025 •

edited

Loading

Uh oh!

x0wllaar commented Mar 21, 2025

Uh oh!

CISC commented Mar 21, 2025

Uh oh!

x0wllaar commented Mar 21, 2025

Uh oh!

ngxson commented Mar 21, 2025

Uh oh!

CISC commented Mar 21, 2025

Uh oh!

CISC commented Apr 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

llama: support Qwen3 #12501

llama: support Qwen3 #12501

Uh oh!

Conversation

CISC commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

x0wllaar commented Mar 21, 2025

Uh oh!

CISC commented Mar 21, 2025

Uh oh!

x0wllaar commented Mar 21, 2025

Uh oh!

ngxson commented Mar 21, 2025

Uh oh!

CISC commented Mar 21, 2025

Uh oh!

CISC commented Apr 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

CISC commented Mar 21, 2025 •

edited

Loading