Skip to content

Context window incorrectly capped at 8192 tokens when using Ollama (token count exceeded 8192) #473

@ansh-info

Description

@ansh-info

Description

I’m using ART with a local Ollama server as the inference backend (for both the agent model and judge models). I’ve configured my Ollama model with a context window well above 8192 tokens (e.g. ctx: 16384) and adjusted num_predict accordingly.

However, in many runs I still get errors like:

token count exceeded 8192

This happens even though:

  • The Ollama model is configured with ctx > 8192 (for example 16384).

  • I’m explicitly passing the correct base URL pointing to my local Ollama server for:

    • The agent model (used by init_chat_model)
    • The judge model (RULER)
    • Any other inference calls

This suggests there is a hardcoded or implicit max token limit of 8192 somewhere in ART/RULER, or in how token counts are computed, independent of the actual model’s context window.

What I expect

  • ART should respect the context window of the underlying model or the configured ctx when running through Ollama.

  • If a hard limit exists (e.g. 8192), it should be:

    • Documented and configurable; or
    • Derived from the model’s metadata, not hard-coded.

What actually happens

  • Even with a model and server configured to support > 8k context, I regularly get token count exceeded 8192 errors.

  • This happens when:

    • Running rollouts with a LangGraph agent via init_chat_model
    • Running RULER scoring with the same Ollama backend

Environment

  • Backend: Local Ollama server
  • Model: Qwen / other Ollama-hosted model (with ctx > 8192)
  • ART: latest version (as of date of issue)
  • Using ART’s LangGraph integration (init_chat_model) and RULER scoring

Questions / Requests

  • Is there an internal default limit of 8192 tokens that’s applied regardless of the model’s context?
  • Can you expose this limit via configuration, or derive it from the model / backend rather than hardcoding?
  • Any guidance on how to set ART/RULER up so that it fully respects Ollama’s larger ctx?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions