increase graph_max_nodes for finetune

graehl · graehl · commit 4add0180e1dd · 2025-09-17T19:40:34.000-07:00
fix regression during finetune on Llama-3.2-1B-F32: GGML_ASSERT(cgraph->n_nodes < cgraph->size) failed git bisect applying the most recent finetune (SGD) change showed that d498af3 Georgi Gerganov 2025-07-18 14:31:15 +0300 graph : avoid huge warm-up graphs for MoE models (ggml-org#14753) which greatly decreased graph_max_nodes has been responsible for finetune failing on reasonably sized models for the past two months. partially reverting the decrease (maybe larger models still fail) note: env LLAMA_SET_ROWS=0 is needed also or else: GML_ASSERT(!node->view_src || node->op == GGML_OP_CPY || node->op == GGML_OP_VIEW || node->op == GGML_OP_RESHAPE || node->op == GGML_OP_PERMUTE || node->op == GGML_OP_TRANSPOSE) failed (the node->op in question is indeed a rows op) unfortunately a git revert on: 8a4280c Georgi Gerganov 2025-08-28 12:27:02 +0300 kv-cache : remove LLAMA_SET_ROWS checks (ggml-org#15505) is not straightforward, so this branch is behind that.
diff --git a/src/llama-context.cpp b/src/llama-context.cpp
@@ -1338,7 +1338,7 @@ void llama_context::output_reorder() {
 //
 
 uint32_t llama_context::graph_max_nodes() const {
-    return std::max<uint32_t>(1024u, 8u*model.n_tensors());
+    return std::max<uint32_t>(4096u, 8u*model.n_tensors());
 }
 
 llm_graph_result * llama_context::get_gf_res_reserve() const {

Original file line number	Diff line number	Diff line change
`@@ -1338,7 +1338,7 @@ void llama_context::output_reorder() {`
`1338`	`1338`	`//`
`1339`	`1339`
`1340`	`1340`	`uint32_t llama_context::graph_max_nodes() const {`
`1341`		`- return std::max<uint32_t>(1024u, 8u*model.n_tensors());`
	`1341`	`+ return std::max<uint32_t>(4096u, 8u*model.n_tensors());`
`1342`	`1342`	`}`
`1343`	`1343`
`1344`	`1344`	`llm_graph_result * llama_context::get_gf_res_reserve() const {`