-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Refactor llama-model.cpp #16252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor llama-model.cpp #16252
Conversation
|
This is a nightmare to review and rebase until merged though, also you seem to have pushed the same changes to your |
|
@CISC Ye, need them for working there, but I'll revert once I'm done (unless this is done first and I can merge on top). I know it's a nightmare, I already kind of went through it when I asked an LLM to automate some tasks and it proceeded merrily ripping out methods just because some classes didn't inherit from If you want, I can write a script that runs tree-sitter on the original definitions in llama-model.cpp vs the new classes and shows any differences to verify that nothing was accidentally lost. |
|
Maybe it would be easier to refactor that partially, just a subset of the models? |
|
This seems to be a good change, just have some other ideas:
|
|
All right, I've done the changes suggested by @ngxson - renamed the files to just .cpp and merged all the header files into one models.h. I've also merged all the current changes from master. Could we possibly move this forward somehow? This isn't going to get any easier any time soon... @ggerganov ? |
|
Yes, thanks for the help. I will take a look soon at the Qwen 3 Next PR and also this reorganization here. Likely over the weekend. |
|
The diff is too big so I had a look at some files under your fork instead. It looks good overall, just a nitpick stuff: Seems like many files starts with this pattern: While it's cleaner to be: Should be a real quick regex replace to change |
|
@ngxson aye, done. |
|
@pwilkin Thank you! |
Sorry, I don't understand. Yeah, there are 117 changed files in this PR because it's a refactoring - it extracts the builders from I don't think the exact architecture of the PR matters - GitHub will always show the final changes in the PR. |
|
No, you have changes that don't belong here, from PRs that have been merged recently. This happens occasionally, not sure what exactly causes it. Rebasing your branch should fix it. |
Oh, that. Yeah, that's some weird byproduct of merging the current master. I'll try to rebase instead. |
0d6329d to
dfd84ad
Compare
|
@NeoZhangJianyu @0cc4m did a clean rebase |
7f82b27 to
803221e
Compare
|
@ggerganov @CISC turns out that formatting those calls is exactly the perfect mindless task to do during a Teams call at work ;) Should be good now. |
Quickly glanced through them, looks like you missed a few (I take offense sir! :D ):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to create a merged diff of all the models so we can verify that nothing got left out?
Working on it. |
|
Aight, was a bit annoying getting an LLM to generate all the diffs properly but I think this is it. |
I'm not sure that is reliable, who knows what it did, it even messed up the diff several places. :) I think the only way to get a fairly usable diff is to run both through |
That won't do it as well, you're going to run into a nesting issue. Since the refactoring splits code into header and definition, you'll have level 2 bracketing (class -> method) vs level 1 bracketing (class::method) |
39bca8b to
c8c8fc7
Compare
|
@CISC @ggerganov I fixed the script to properly report diffs, removed all the extra semicolons it reported. We're down to the following, which I think are innocent: --- reference/llm_build_bitnet::llm_graph_context (clang-formatted)
+++ src/models/llm_build_bitnet::llm_graph_context (clang-formatted)
@@ -68,7 +68,7 @@
if (model.layers[il].bo) {
cur = ggml_add(ctx0, cur, model.layers[il].bo);
}
- cb(cur, "attn_o_out", il);
+ cb(cur, "attn_out", il);
}
if (il == n_layer - 1 && inp_out_ids) {
cur = ggml_get_rows(ctx0, cur, inp_out_ids);
--- reference/llm_build_lfm2::build_attn_block (clang-formatted)
+++ src/models/llm_build_lfm2::build_attn_block (clang-formatted)
@@ -1,7 +1,7 @@
{
GGML_ASSERT(hparams.n_embd_v_gqa(il) == hparams.n_embd_k_gqa(il));
- auto const n_embd_head = hparams.n_embd_head_v;
- auto const n_head_kv = hparams.n_head_kv(il);
+ const auto n_embd_head = hparams.n_embd_head_v;
+ const auto n_head_kv = hparams.n_head_kv(il);
auto *q = build_lora_mm(model.layers[il].wq, cur);
cb(q, "model.layers.{}.self_attn.q_proj", il);
auto *k = build_lora_mm(model.layers[il].wk, cur);
--- reference/llm_build_lfm2::build_shortconv_block (clang-formatted)
+++ src/models/llm_build_lfm2::build_shortconv_block (clang-formatted)
@@ -15,7 +15,7 @@
cb(bcx, "model.layers.{}.conv.in_proj", il);
constexpr auto n_chunks = 3;
GGML_ASSERT(bcx->ne[0] % n_chunks == 0);
- auto const chunk_size = bcx->ne[0] / n_chunks;
+ const auto chunk_size = bcx->ne[0] / n_chunks;
auto *b =
ggml_view_3d(ctx0, bcx, chunk_size, bcx->ne[1], bcx->ne[2], bcx->nb[1],
bcx->nb[2], 0 * chunk_size * ggml_element_size(bcx));I think this is ready to merge, let's do it before another model gets added to llama-model.cpp and I have to rebase again :P |
Ooops, too late. :P Mind fixing the last little nits while you're at it, then we'll merge. |
Meaning those three diffs I just posted? I think all of them are correct fixes in context, so I'd keep them tbh. But if you want I can add them too, just for completeness' sake (though reverting the "attn_o_out" seems weird). |
934128c to
a81302f
Compare
|
Minimax is in. |
Yeah, just because this is a refactor and not changing more than necessary, but nvm. |
Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>
|
@pwilkin Great job BTW, this was a massive undertaking! |
Yeah I think this was the hardest one of the split tasks :) |
@ggerganov I know you said you were planning to do it, but honestly it's been a nightmare working on all the model implementations with the huge llama-model.cpp, so I wanted to just get the "easy" albeit tedious part out of the way. Moved all llm_build_* definitions to their separate class files in src/models/