Skip to content

Commit a20b2b0

Browse files
authored
context : round n_tokens to next multiple of n_seqs when reserving (#14140)
This fixes RWKV inference which otherwise failed when the worst case ubatch.n_seq_tokens rounded to 0.
1 parent 2e89f76 commit a20b2b0

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

src/llama-context.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1332,7 +1332,7 @@ ggml_cgraph * llama_context::graph_reserve(uint32_t n_tokens, uint32_t n_seqs, u
13321332
LLAMA_LOG_DEBUG("%s: reserving a graph for ubatch with n_tokens = %4u, n_seqs = %2u, n_outputs = %4u\n", __func__, n_tokens, n_seqs, n_outputs);
13331333

13341334
if (n_tokens % n_seqs != 0) {
1335-
n_tokens = (n_tokens / n_seqs) * n_seqs;
1335+
n_tokens = ((n_tokens + (n_seqs - 1)) / n_seqs) * n_seqs; // round to next multiple of n_seqs
13361336
n_outputs = std::min(n_outputs, n_tokens);
13371337

13381338
LLAMA_LOG_DEBUG("%s: making n_tokens a multiple of n_seqs - n_tokens = %u, n_seqs = %u, n_outputs = %u\n", __func__, n_tokens, n_seqs, n_outputs);

0 commit comments

Comments
 (0)