-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama : rename n_ctx to kv_size #5568
Conversation
985fd62
to
47c662b
Compare
Does |
In the decoder - yes |
@@ -1545,7 +1545,7 @@ struct llama_hparams { | |||
int32_t n_tokens; | |||
|
|||
// llm_build_context | |||
static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= n_ctx | |||
static constexpr int32_t n_kv = 32; // size of KV cache to consider (n_kv <= kv_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this commend was missing a closing )
I do not agree with this change (but I like the underlying intention of making As I'm working on supporting Mamba in With Mamba, the KV cache size is tied to the maximum number of distinct sequences processed at the same time. Not the "context size". What I propose instead (and this is what I've started doing in #5328) is to keep TL;DR: renaming |
The
n_ctx
name is causing some confusion since it's actual meaning is the size of the KV cache, whilen_ctx_train
is the training context of the modelThis change fixes that, but since it is a big one and touches a lot of stuff, I'm not sure if it worth merging. Maybe sometime in the future, when the time is right
Original PR: #5546