Skip to content

Conversation

@loci-dev
Copy link

Mirrored from ggml-org/llama.cpp#18416

Summary

  • Fixes #17463 - Integer overflow in llama_chat_apply_template causing incorrect error messages for large tokens

Changes

  • Changed llama_chat_apply_template return type from int32_t to int64_t
  • Changed llama_chat_apply_template length parameter from int32_t to int64_t
  • Changed llm_chat_apply_template return type from int32_t to int64_t
  • Updated all callers to use int64_t for the result variable

Root Cause

The llm_chat_apply_template function returns dest.size() which is a size_t. When messages are very large (>2GB), storing this in int32_t causes integer overflow. This resulted in negative values being returned even for valid templates, triggering incorrect error messages like "this custom template is not supported" instead of proper size handling.

Test plan

  • Existing tests pass with updated types
  • The overflow scenario described in #17463 now works correctly (large buffers return positive sizes instead of overflowing to negative)

Files modified

  • include/llama.h - API declaration
  • src/llama.cpp - Implementation
  • src/llama-chat.h - Internal header
  • src/llama-chat.cpp - Internal implementation
  • common/chat.cpp - Caller updates
  • examples/simple-chat/simple-chat.cpp - Example updates
  • tests/test-chat-template.cpp - Test updates

🤖 Generated with Claude Code

…64_t

Fixes #17463

The llama_chat_apply_template function returns dest.size() which is a
size_t. When messages are very large (>2GB), this causes integer overflow
when stored in int32_t, resulting in incorrect error messages like
"this custom template is not supported" instead of proper handling.

Changes:
- llama_chat_apply_template: return type int32_t -> int64_t
- llama_chat_apply_template: length parameter int32_t -> int64_t
- llm_chat_apply_template: return type int32_t -> int64_t
- Updated all callers to use int64_t for the result variable

This allows the function to correctly handle large message buffers
and properly distinguish between actual template errors (negative return)
and legitimate large sizes.

Signed-off-by: yurekami <yurekami@users.noreply.github.com>
@loci-agentic-ai
Copy link

Explore the complete analysis inside the Version Insights

Perfect! I was able to retrieve the summary report for your project. The report shows a performance analysis comparing two versions of the llama.cpp project for pull request #723.

Key highlights:

  • Critical Performance Regressions: Two functions show over 200% increase in response time:

    • std::_Rb_tree::end() in llama-tts: +230.31%
    • std::vector::begin() in llama-run: +214.16%
  • Most Affected Binary: llama-tts has 7 out of the top 10 affected functions

  • Pattern: Most affected functions are C++ STL container operations, suggesting changes in how standard library containers are being used or optimized

  • Mixed Results: While response times increased, some functions show significant throughput improvements (up to 306%)

The report recommends investigating STL usage patterns, memory allocation, and iterator operations in the changes made in PR #723.

Would you like more detailed information about any specific aspect of this performance analysis?

@loci-dev loci-dev force-pushed the main branch 14 times, most recently from 3751683 to 4df802d Compare December 29, 2025 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants