Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support glm3 and glm4. #8031

Merged
merged 39 commits into from
Jul 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
6630a2d
add chatglm3-6b model support huggingface model:
xingxingqiao May 29, 2024
5a914ff
remove .rotary_pos_emb.inv_freq and unuse code for chatglm3 model
xingxingqiao May 15, 2024
f626b71
fix lint error
xingxingqiao May 24, 2024
f3bc337
optimize convert-hf-to-gguf.py for chatglm model
xingxingqiao May 16, 2024
1fc5bf5
support glm-4-9b-chat
xingxingqiao Jun 17, 2024
8c5f1b2
fix eos tokens to glm4
youth123 Jun 20, 2024
95fd910
remove unused log
youth123 Jun 20, 2024
e773174
Fix eos tokens to glm4 and adapts to glm3
youth123 Jun 20, 2024
4b65b64
add preprocess to chatglm3 and chatglm4
youth123 Jun 21, 2024
3a4d579
add eos_id_list to llama.cpp
youth123 Jun 24, 2024
9570806
fix conflicts
youth123 Jun 25, 2024
3b67ff8
fix code style
youth123 Jun 25, 2024
5f8f465
fix code style
youth123 Jun 25, 2024
f8d4fc9
fix conflicts
youth123 Jun 25, 2024
a67bc8f
fix conflicts
youth123 Jun 25, 2024
3557944
Merge branch 'glm_support'
youth123 Jun 25, 2024
89e8aaf
Revert "add eos_id_list to llama.cpp"
youth123 Jun 25, 2024
9396c7b
set <|endoftext|> as eos and <|user|> as eot
youth123 Jun 26, 2024
e18a536
Merge remote-tracking branch 'offical/master'
youth123 Jun 26, 2024
0595f03
fix chat template bug
youth123 Jun 26, 2024
7357273
add comment to glm prefix and suffix
youth123 Jun 27, 2024
1dc8e91
Merge remote-tracking branch 'offical/master'
youth123 Jun 27, 2024
e9e47eb
fix conflicts and add rope_ratio & ChatGLMForConditionalGeneration
youth123 Jun 27, 2024
482bdea
merge master
youth123 Jun 28, 2024
bbe1926
fix chat template bug
youth123 Jun 28, 2024
d07f0a9
fix codestyle
youth123 Jul 1, 2024
0d3a94a
merge master
youth123 Jul 1, 2024
5e9dba6
fix conflicts
youth123 Jul 1, 2024
865dd03
modified the general name of glm model
youth123 Jul 1, 2024
71c8e02
Merge remote-tracking branch 'offical/master'
youth123 Jul 2, 2024
ec89d06
merge master
youth123 Jul 3, 2024
80b381b
fix conflicts
youth123 Jul 3, 2024
bf54db2
remove prefix and suffix
youth123 Jul 3, 2024
bce74d8
use normal glm4 chattempalte & use LLM_FFN_SWIGLU in phi3
youth123 Jul 3, 2024
3be4270
fix: resolve Flake8 errors in `convert-hf-to-gguf.py`
Umpire2018 Jul 5, 2024
ed54a65
Merge pull request #2 from Umpire2018/fix/flake8-error
youth123 Jul 7, 2024
5b760f2
fix rope ratio to solve incorrect answers
youth123 Jul 7, 2024
223eb18
merge master
youth123 Jul 7, 2024
4e85b06
fix by comments
youth123 Jul 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix eos tokens to glm4
  • Loading branch information
youth123 committed Jun 20, 2024
commit 8c5f1b2b6c4d8d5afde26769b9721f4cb6ec5665
17 changes: 12 additions & 5 deletions convert-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2728,6 +2728,8 @@ def set_vocab_chatglm3(self):
tokenizer = AutoTokenizer.from_pretrained(dir_model, trust_remote_code=True)
vocab_size = hparams.get("padded_vocab_size", len(tokenizer.get_vocab()))
assert max(tokenizer.get_vocab().values()) < vocab_size
role_special_tokens = ["<|system|>", "<|user|>", "<|assistant|>", "<|observation|>"]
special_tokens = ["[MASK]", "[gMASK]", "[sMASK]", "sop", "eop"] + role_special_tokens
print(vocab_size)
print(max(tokenizer.get_vocab().values()))
for token_id in range(vocab_size):
Expand All @@ -2750,7 +2752,12 @@ def set_vocab_chatglm3(self):
text = f"[PAD{token_id}]".encode("utf-8")

if token_id >= tokenizer.tokenizer.sp_model.vocab_size():
toktype = SentencePieceTokenTypes.UNKNOWN
if piece in special_tokens:
# show special tokens in prompt
toktype = SentencePieceTokenTypes.USER_DEFINED
else:
print(f"unknow token: {piece}")
toktype = SentencePieceTokenTypes.UNKNOWN
tokens.append(text)
scores.append(score)
toktypes.append(toktype)
Expand Down Expand Up @@ -2856,9 +2863,9 @@ def set_vocab(self):
special_vocab.chat_template = "ChatGLM4"
special_vocab.merges = merges
# only add special tokens when they were not already loaded from config.json
if len(special_vocab.special_token_ids) == 0:
special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["<|endoftext|>"])
special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"])
# if len(special_vocab.special_token_ids) == 0:
special_vocab._set_special_token("bos", tokenizer.get_added_vocab()["<|endoftext|>"])
youth123 marked this conversation as resolved.
Show resolved Hide resolved
special_vocab._set_special_token("eos", tokenizer.get_added_vocab()["<|endoftext|>"])
# this one is usually not in config.json anyway
special_vocab._set_special_token("unk", tokenizer.get_added_vocab()["<|endoftext|>"])
special_vocab.add_to_gguf(self.gguf_writer)
Expand Down Expand Up @@ -2955,7 +2962,7 @@ def parse_args() -> argparse.Namespace:
help="model is executed on big endian machine",
)
parser.add_argument(
"model", type=Path,
"--model", type=Path,
help="directory containing model file",
)
parser.add_argument(
Expand Down
35 changes: 31 additions & 4 deletions llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1802,9 +1802,11 @@ enum e_model {
MODEL_2_8B,
MODEL_3B,
MODEL_4B,
MODEL_6B,
MODEL_6_9B,
MODEL_7B,
MODEL_8B,
MODEL_9B,
MODEL_12B,
MODEL_13B,
MODEL_14B,
Expand Down Expand Up @@ -3918,9 +3920,11 @@ static const char * llama_model_type_name(e_model type) {
case MODEL_2_8B: return "2.8B";
case MODEL_3B: return "3B";
case MODEL_4B: return "4B";
case MODEL_6B: return "6B";
case MODEL_6_9B: return "6.9B";
case MODEL_7B: return "7B";
case MODEL_8B: return "8B";
case MODEL_9B: return "9B";
case MODEL_12B: return "12B";
case MODEL_13B: return "13B";
case MODEL_14B: return "14B";
Expand Down Expand Up @@ -4507,8 +4511,8 @@ static void llm_load_hparams(
{
ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
switch (hparams.n_layer) {
case 28: model.type = e_model::MODEL_7B; break;
case 40: model.type = e_model::MODEL_8B; break;
case 28: model.type = e_model::MODEL_6B; break;
case 40: model.type = e_model::MODEL_9B; break;
default: model.type = e_model::MODEL_UNKNOWN;
}
} break;
Expand Down Expand Up @@ -18362,6 +18366,19 @@ llama_token_type llama_token_get_type(const struct llama_model * model, llama_to
}

bool llama_token_is_eog(const struct llama_model * model, llama_token token) {
auto arch_name = llama_model_arch_name(model->arch);
auto vocab_type = model->vocab.type;
if (strcmp(arch_name, "chatglm") == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llama_token_is_eog is called quite often, doing string compare here may have impact on performance

Copy link
Collaborator

@ngxson ngxson Jun 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at tokenizer_config.json, I think that it's safe to stop at EOS (<|endoftext|>), so no need to hard-code token IDs here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: looking at chat template, seems like the model does not have the notion end-of-turn token (strange!). Maybe we need to introduce EOT token as a list instead of single value. This will require adding metadata to gguf (CC @ggerganov )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I will add an eot list to the metadata of gguf. Then, during the initialization of vocab, I will put all the eot entries into this variable. At that time, the judgment will only require traversing this list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already added the eod_id_list variable to the gguf meta and ensured compatibility with the previous versions. Could you please check if there are any other modifications needed?

https://github.com/ggerganov/llama.cpp/pull/8031/files#diff-4f653096980bd7d10518aa909cb648452cd3aa380ff93cb9fb642dca48536526R110

https://github.com/ggerganov/llama.cpp/pull/8031/files#diff-150dc86746a90bad4fc2c3334aeb9b5887b3adad3cc1459446717638605348efR5090

if (LLAMA_VOCAB_TYPE_BPE == vocab_type) { // glm4
return token != -1 && (
token == llama_token_eos(model) ||
token == llama_token_eot(model) ||
token == 151329 ||
token == 151336 ||
token == 151338
);
}
}
return token != -1 && (
token == llama_token_eos(model) ||
token == llama_token_eot(model)
Expand Down Expand Up @@ -18424,8 +18441,18 @@ int32_t llama_tokenize(
int32_t n_tokens_max,
bool add_special,
bool parse_special) {
auto res = llama_tokenize_internal(model->vocab, std::string(text, text_len), add_special, parse_special);

auto arch_name = llama_model_arch_name(model->arch);
auto prompt = std::move(std::string(text, text_len));
auto vocab_type = model->vocab.type;
if (strcmp(arch_name, "chatglm") == 0) {
// chatglm3
if (LLAMA_VOCAB_TYPE_SPM == vocab_type) {
prompt = "[gMASK]sop<|user|>\n" + prompt + "<|assistant|>";
} else if (LLAMA_VOCAB_TYPE_BPE == vocab_type) { // glm4
prompt = "[gMASK]<sop><|user|>\n" + prompt + "<|assistant|>";
youth123 marked this conversation as resolved.
Show resolved Hide resolved
}
}
auto res = llama_tokenize_internal(model->vocab, prompt, add_special, parse_special);
if (n_tokens_max < (int) res.size()) {
// LLAMA_LOG_ERROR("%s: too many tokens\n", __func__);
return -((int) res.size());
Expand Down