Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requesting Qwen-7B Support #2528

Closed
aiaicode opened this issue Aug 5, 2023 · 11 comments
Closed

Requesting Qwen-7B Support #2528

aiaicode opened this issue Aug 5, 2023 · 11 comments

Comments

@aiaicode
Copy link

aiaicode commented Aug 5, 2023

https://huggingface.co/Qwen/Qwen-7B

https://huggingface.co/Qwen/Qwen-7B-Chat

These two models are performing better than 13B models. and on C-eval beating ChatGPT.

Requesting model support for Llama.cpp.

@vonjackustc
Copy link

Qwen is similar to Llama model.
You need add bias params for QKV, and change the tokenizer to tiktoken.

I wonder if we can convert tiktoken vocab format to sentencepiece directly.

@vonjackustc
Copy link

https://huggingface.co/JosephusCheung/Qwen-LLaMAfied-7B-Chat
This repo from JosephusCheung can be converted into ggml format. You still need to change the EOS token from ggml source code (2 for llama but other for Qwen).

I think this model is still slightly different from the original Qwen especially in QwenAttentionBlock, Qwen applied log(n) to query.

@wtarreau
Copy link
Contributor

Sadly I couldn't convert it, always getting cryptic python errors.

@arch-btw
Copy link
Contributor

@aiaicode @vonjackustc @wtarreau

#3337

@KerfuffleV2
Copy link
Collaborator

I found this script for converting from Tiktoken vocab to HF: https://gist.github.com/xenova/a452a6474428de0182b17605a98631ee (didn't test it, but it looks reasonable and seems to be from a HF person.)

To actually use it, you'll also need to use #3743 since there wasn't already support for loading from merges.txt.

@TheBloke too since I assume you're looking to support these Qwen models.

@ggerganov
Copy link
Owner

Is there a blocker to support these models in llama.cpp? Would be nice to support the new 1.8B and 72B versions

@simveit
Copy link

simveit commented Dec 1, 2023

https://github.com/QwenLM/qwen.cpp
can this maybe be of any help to support qwen in the future?

@choyakawa
Copy link

choyakawa commented Dec 1, 2023

We should first support QKVO bias in llama.
It's in hf llama config.json: "attention_bias": true

@choyakawa
Copy link

There was an implementation here but failed, can anyone figure out what's wrong with this?
#3743 (comment)
It seems correct just to add bias terms to each parts.

@ggerganov
Copy link
Owner

  • The bias tensors were not offloaded via a separatecb() call
  • Seems like special tokens (<|im_start|>, <|im_end|>) were not escaped in their test?

@DavidGOrtega
Copy link

should this be closed by #5037 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants