-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: "main : failed to eval" with Self-extend and small context #8570
Comments
Checking out the PRs on llamaCPP that implemented self extend: And checking the original paper: And taking a look at the pseudocode: I was wondering, the following:
|
The SelfExtend implementation in The 2 implementations in Not sure why you are crashing though - that's strange |
Thanks @ggerganov for taking a look at this. I will check those RoPE shifts. Regarding the crash, I ran again the llama-cli with the following command, inspired by your example in PR#4815:
I let it run for a while, with multiple times seeing this print
Also, I was printing the "n_past" and "llama_get_kv_cache_token_count"
As you can see, the NPast changes with the "Self-extend" code. However, the kvCount, seems to crash when the "n_ctx" is exceeded. |
Ah wait, if you exceed the When not using SelfExtend, you can exceed the When SelfExtend is enabled, we can no longer exceed the |
If I understand well, as it currently stands, the KV cache can't mix SelfExtend with the basic "context shift". This would prevent its use for the chat application, right?? |
I wouldn't say it prevents chat application - any application that goes beyond the training context of the model relies on some sort of a hack. With context shift, you lose some of the old context when you go beyond the limit. With SE, you seemingly extend the size of the context, but it's not without its deficiencies as well. For example, with 8192 training context, you can:
So you can use either strategy based on your use case. As long as you are in the training context there will be no issues. Beyond that - it might or might not work |
Thanks for going deep on your explanation. You are awesome. I will close this issue given that the current behavior of The only missing part would be code differences between the |
What happened?
I have been playing with the context window and I have been encountering issues running the "Llama-3-Smaug-q2_k.gguf" model. When I run
llama-cli
with that model using the default execution with the command below, the program behaves as expectedHowever, when the "Self-Extend" is enabled (gan/gaw) in interactive mode, after a while (> than context) it crashes with
main : failed to eval
. Here is the command:Below is a relevant log output asking these questions:
Also, I noticed that the "examples/passkey" has a different implementation for the "Self-extend" code as it does "examples/main". Which one is the correct one?
Thanks for your help.
Name and Version
llama-cli -v
version: 3392 (bda62d7)
built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0
What operating system are you seeing the problem on?
Mac
Relevant log output
The text was updated successfully, but these errors were encountered: