Name and Version
latest build b7921
Operating systems
Windows
GGML backends
Vulkan
Hardware
RTX 4090
Models
GLM-4-32B-0414
Problem description & steps to reproduce
By bisecting the commits, I can isolate this regression which occurs after merging #18986
Before, shifting (llama_memory_seq_rm + llama_memory_seq_add in the middle of ctx) works fine. After this PR, this capability no longer works.
To reproduce:
- Populate KV with a long example prompt
- Remove some tokens in the middle by using
llama_memory_seq_rm + llama_memory_seq_add
- Generate from the end and observe the output
The loss of coherence is not immediately obvious, I have attached an example output to illustrate this below.
First Bad Commit
a5eaa1d
Relevant log output
This is a partial example, but it shows the effect of this regression on the text completion:
Partial prompt:
The edge of the town loomed closer, its crooked shapes stark against the deepening indigo of the night sky. The air grew heavier with the smell of woodsmoke and
Ctx Shift + Continuation, Before a5eaa1 (good):
damp cobblestones. He couldn't risk entering the main streets just yet; dawn was still hours away, and Silas Vane’s hunters would likely be patrolling with purpose. He needed a different approach.
Ctx Shift + Continuation, After a5eaa1 (bad):
the pile of shivering documents would have been better under the lock picked. He had one more thing: the only thing he ownedle in the world. He had to pick up the pieces he had just found out of the basket, trying to get of course of all his pockets, watching the ground of his possessions with a wary hand.
Name and Version
latest build b7921
Operating systems
Windows
GGML backends
Vulkan
Hardware
RTX 4090
Models
GLM-4-32B-0414
Problem description & steps to reproduce
By bisecting the commits, I can isolate this regression which occurs after merging #18986
Before, shifting (llama_memory_seq_rm + llama_memory_seq_add in the middle of ctx) works fine. After this PR, this capability no longer works.
To reproduce:
llama_memory_seq_rm+llama_memory_seq_addThe loss of coherence is not immediately obvious, I have attached an example output to illustrate this below.
First Bad Commit
a5eaa1d
Relevant log output
This is a partial example, but it shows the effect of this regression on the text completion:
Partial prompt:
Ctx Shift + Continuation, Before a5eaa1 (good):
Ctx Shift + Continuation, After a5eaa1 (bad):