Skip to content

Improve tool calling response handling#39

Open
MichaelAnders wants to merge 7 commits intoFast-Editor:mainfrom
MichaelAnders:feature/improve-tool-calling
Open

Improve tool calling response handling#39
MichaelAnders wants to merge 7 commits intoFast-Editor:mainfrom
MichaelAnders:feature/improve-tool-calling

Conversation

@MichaelAnders
Copy link
Contributor

@MichaelAnders MichaelAnders commented Feb 6, 2026

Summary

  • Additional logging for tool call parsing and execution
  • Hard-coded shell commands for reliable tool execution
  • Deduplication of tool calls within a single response
  • Collect and return results from all called tools
  • Ollama uses specified Ollama model
  • Fix double-serialized JSON parameters from some providers
  • Fix stripThinkingBlocks() destroying markdown bullet points in Ollama responses
  • Handle Ollama offline gracefully with retry and 503 response
  • Add optional TOPIC_DETECTION_MODEL env var to redirect topic detection to a lighter model
  • Add optional SUGGESTION_MODE_MODEL env var to control suggestion mode LLM calls (none to disable, or redirect to other model)
  • Remove tool_result_direct short-circuit and improve agentic loop
  • Add external file read with tilde expansion and user approval flow
  • Improve tool calling response handling for Ollama models

Problems

  • Topic detection requests use the same large model as the main request. For users running local models, this adds unnecessary GPU load for a simple classification task.
  • Several issues with the agentic loop and Ollama integration:
  1. stripThinkingBlocks() was destroying valid response content - The regex matched standard markdown bullets as "thinking markers", causing most list-based responses to be truncated after the first heading.
  2. Suggestion mode uses GPU resources - Every user message triggers a full agentic loop with tools for suggestion prediction, blocking responses on large models (70b+).
  3. Tool calling response handling had edge cases with Ollama's response format.

Configuration for TOPIC_DETECTION_MODEL

# Use main model (default, unchanged behavior)
TOPIC_DETECTION_MODEL=default

# Redirect to a lighter model
TOPIC_DETECTION_MODEL=llama3.2:1b

Files Changed

File Change
src/api/middleware/logging.js Enhanced request/response logging
src/api/middleware/request-logging.js Additional request logging
src/api/router.js Router updates for tool handling
src/config/index.js Ollama model configuration, added TOPIC_DETECTION_MODEL env var (default: "default"), wired into config object and reloadConfig() for hot reload support
src/orchestrator/index.js Tool call deduplication, result collection, direct result return
src/tools/index.js Deep-parse double-serialized JSON params, execution logging
src/tools/stubs.js AskUserQuestion handler improvements
src/tools/workspace.js Workspace tool updates

Test plan

  • Verify tool calls are deduplicated when model returns duplicates
  • Verify results from all called tools are collected and returned
  • Verify Ollama uses the configured model
  • Verify double-serialized JSON parameters (e.g. stringified arrays) are correctly parsed
  • Verified markdown bullet points survive in Ollama responses (the original bug)
  • Tested Ollama offline/online transitions
  • Tested suggestion mode disable (SUGGESTION_MODE_MODEL=none)

- Add fallback parsing for Ollama models that return tool calls as JSON
  text in message content instead of using the structured tool_calls field
- Return tool results directly to CLI instead of making a follow-up LLM
  call, reducing latency and preventing hallucinated rewrites of output
- Add dedicated Glob tool returning plain text (one path per line) instead
  of JSON, with workspace_list accepting both 'pattern' and 'patterns'
- Clarify why Glob is not aliased to workspace_list (format mismatch)
- Additional logging for tool call parsing and execution
- Hard-coded shell commands for reliable tool execution
- Deduplication of tool calls within a single response
- Collect and return results from all called tools
- Ollama uses specified Ollama model
- Fix double-serialized JSON parameters from some providers
Tool results now loop back to the model for natural language synthesis
instead of being returned raw to the CLI. This fixes the bug where
conversational messages (e.g. "hi") triggered tool calls and dumped
raw output.

Additional improvements:
- Context-aware tiered compression that scales with model context window
- Empty response detection with retry-then-fallback
- _noToolInjection flag to prevent provider-level tool re-injection
- Auto-approve external file reads in tool executor
- Conversation context search in workspace_search
Three concurrent runAgentLoop calls per user message waste GPU time with
large models (~30s each). This adds SUGGESTION_MODE_MODEL config to skip
("none") or redirect suggestion mode to a lighter model. Also adds ISO
timestamps and mode tags to debug logs for easier debugging.
- Add ECONNREFUSED to retryable errors and check undici TypeError .cause.code
  so connection-refused errors get retried with backoff
- Wrap invokeModel in try/catch returning structured 503 with provider_unreachable
  error instead of raw TypeError bubbling to Express error middleware
- Fix suggestion mode early return response shape (json -> body) to match
  router expectations
… responses

The heuristic-based stripThinkingBlocks() matched standard markdown bullets
(- item, * item) as "thinking block markers" and dropped all subsequent content.
Replace with stripThinkTags() that only strips <think>...</think> tags used by
models like DeepSeek and Qwen for chain-of-thought reasoning.
The dotenv override (override: NODE_ENV !== "test") requires
NODE_ENV=test to prevent .env values from stomping on test-set
environment variables.
@MichaelAnders MichaelAnders force-pushed the feature/improve-tool-calling branch from 6bcc803 to a2d78ee Compare February 11, 2026 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant