Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18432

Summary

Fixes #18409

When shift_context() discards tokens to free KV cache space, it decrements current_position but not stop_generation_position. This causes the termination check (current_position >= stop_generation_position) to never trigger, resulting in infinite text generation.

Solution

Decrement stop_generation_position by n_discard tokens alongside current_position.

When shift_context() discards tokens to free KV cache space, it decrements
current_position but not stop_generation_position. This causes the
termination check (current_position >= stop_generation_position) to never
trigger, resulting in infinite text generation.

Fix by also decrementing stop_generation_position by n_discard tokens.

Fixes #18409
@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

I've generated a summary report for your project. The analysis shows that Pull Request #729 for the llama.cpp repository (owned by auroralabs-loci) has no significant performance impact.

Key findings:

  • ✅ No modified functions showed performance changes greater than 2%
  • ✅ Both response time and throughput time remain stable
  • ✅ The PR is safe to merge from a performance perspective

The changes in this pull request maintain performance stability without introducing any performance regressions or bottlenecks.

@loci-dev loci-dev force-pushed the main branch 2 times, most recently from 8645b59 to f2e8c7f Compare December 29, 2025 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants