-
Notifications
You must be signed in to change notification settings - Fork 645
Description
Description
When configuring Ollama via Litellm, the Litellm docs recommend using:
ollama_chatfor better responses
This uses Ollama’s /api/chat endpoint.
However, in ART’s RULER integration, the judge model needs to:
- Produce JSON-structured outputs (e.g. match a Pydantic schema / tool schema) so RULER can parse correctness, reasoning, etc.
The problem:
-
With Ollama’s
/api/chat(viaollama_chat), the responses are not reliably:- Valid JSON matching the expected schema, or
- Tool-call-compatible in the way RULER expects.
-
As a result:
- The judge’s responses often fail schema validation.
- RULER cannot extract the required fields, and the scoring fails.
Because of that, I’m forced to fall back to the older /api/generate endpoint (ollama), even though:
- Litellm explicitly recommends
ollama_chatoverollama. /api/chatis the more modern endpoint.
What I expect
-
Either:
- Official support / documentation for using Ollama’s
/api/chat(ollama_chat) as a RULER judge with JSON-schema / tool-call style responses; or - Clear guidance that for RULER’s JSON-schema needs, we must currently use
/api/generateandollama.
- Official support / documentation for using Ollama’s
What actually happens
-
Using
ollama_chat:- The
/api/chatendpoint does not produce the JSON / tool-call schema RULER expects. - Judge calls fail schema validation.
- The
-
Using
ollama:- RULER works better, but we lose out on the newer
/api/chatbehavior Litellm recommends.
- RULER works better, but we lose out on the newer
Request
-
Please:
-
Document the supported Ollama configuration for RULER (which engine, which endpoint, any special settings).
-
If possible, add direct support for
ollama_chat+/api/chatthat:- Ensures JSON-mode / tool-call style output works with RULER’s structured response expectations.
-
Or provide example configs + templates that make Ollama’s
/api/chatusable with RULER.
-