Skip to content

MPT support in llama.cpp #3417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Oct 10, 2023
Merged
Changes from 1 commit
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
b49792b
CUDA: added support for ggml_clamp (see also: https://github.com/gger…
jploski Sep 30, 2023
15236e8
mpt : added an implementation based (mostly) on falcon integration, m…
jploski Sep 30, 2023
84e30e8
mpt : protect against "clip_qkv": null in mpt-7b
jploski Sep 30, 2023
00e8c5c
mpt : quick fix to avoid "Strange model" warning when quantizing MPT …
jploski Sep 30, 2023
1be89c4
mpt : addendum to changeset:84e30e8 - leave parameter clamp_kqv out f…
jploski Sep 30, 2023
26c253e
mpt : standardized all tensor names to follow GGUF spec
jploski Sep 30, 2023
df072d2
mpt : addendum to changeset:1be89c40 - use "req" parameter of GGUF_GE…
jploski Sep 30, 2023
90e7d6d
mpt : fixed comment s/gptneox/mpt/
jploski Oct 2, 2023
4708012
mpt : remove tabs, trailing whitespace
jploski Oct 2, 2023
1364bcd
mpt : removed ne01 + n_past == ne00 assertion from alibi (cuda/f32) a…
jploski Oct 3, 2023
7d6a24a
mpt : updated convert-mpt-hf-to-gguf.py to reflect changes made to co…
jploski Oct 6, 2023
292363e
Merge branch 'master' of https://github.com/ggerganov/llama.cpp into …
cebtenzzre Oct 9, 2023
ad3c2f3
comment out n_past instead of marking it unused
cebtenzzre Oct 9, 2023
1a454eb
mpt : removed hardcoded +178 from convert script in favor of utilizin…
jploski Oct 9, 2023
32172f1
mpt : remove unused tokenizer_json in convert script
cebtenzzre Oct 9, 2023
96cf3f5
ggml : remove obsolete n_past assert in ggml_alibi
ggerganov Oct 10, 2023
9b66378
llama : print clam_kqv and max_alibi_bias hparams
ggerganov Oct 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
mpt : remove unused tokenizer_json in convert script
  • Loading branch information
cebtenzzre committed Oct 9, 2023
commit 32172f12f57e02b1a233822c6999be592b8a739e
17 changes: 4 additions & 13 deletions convert-mpt-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,24 +111,15 @@ def parse_args() -> argparse.Namespace:
scores: list[float] = []
toktypes: list[int] = []

tokenizer_json_file = dir_model / 'tokenizer.json'
if not tokenizer_json_file.is_file():
print(f'Error: Missing {tokenizer_json_file}', file = sys.stderr)
sys.exit(1)

# gpt2 tokenizer
gguf_writer.add_tokenizer_model("gpt2")

with open(tokenizer_json_file, "r", encoding="utf-8") as f:
tokenizer_json = json.load(f)

print("gguf: get gpt2 tokenizer vocab")

# MPT token embedding tensors have dimension 50432 (hparams["vocab_size"]),
# but there are only 50254 (len(tokenizer_json["model"]["vocab"]))
# tokens in the vocab, presumably to accomodate some "reserved" tokens;
# this is causing problems down the line in llama.cpp, so we pad the vocab
# with dummy tokens:
# MPT token embedding tensors have dimension 50432 (hparams["vocab_size"]), but
# there are only 50254 (len(tokenizer.vocab)) tokens in the vocab, presumably to
# accomodate some "reserved" tokens; this is causing problems down the line in
# llama.cpp, so we pad the vocab with dummy tokens:

vocab_size = hparams["vocab_size"]

Expand Down