Skip to content

Conversation

CISC
Copy link
Collaborator

@CISC CISC commented Mar 21, 2025

Initial draft based on huggingface/transformers#36878

In case models are released before I can have a look at them this weekend:

TODO

  • Set type for all layer sizes in llama_model::load_hparams
  • Test conversion and inference on all models

@github-actions github-actions bot added the python python script changes label Mar 21, 2025
@x0wllaar
Copy link

Are you planning to add MoE support?

@CISC
Copy link
Collaborator Author

CISC commented Mar 21, 2025

Are you planning to add MoE support?

I'm focusing on non-MoE for now, so if someone wants to work on Qwen3MoE in the mean time they are more than welcome to. :)

@x0wllaar
Copy link

Thank you! I not sure I'm up to the task though lol

@ngxson
Copy link
Collaborator

ngxson commented Mar 21, 2025

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE. Diff are:

  • Shared experts are removed
  • Added k_norm and q_norm (similar to qwen3 dense)

@CISC
Copy link
Collaborator Author

CISC commented Mar 21, 2025

I had a look at the quen3 MoE python code, it's not much difference from qwen2 MoE.

That was my initial impression too, I can have a stab at it if no-one else volunteers, just didn't want to bite off too much at once (esp. given the flustercuck 57B-A14B was). :)

@CISC
Copy link
Collaborator Author

CISC commented Apr 8, 2025

Superseded by #12828

@CISC CISC closed this Apr 8, 2025
@CISC CISC deleted the qwen3 branch April 8, 2025 20:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants