fix(llm): remove request timeout from LLM HTTP client by bug-ops · Pull Request #955 · bug-ops/zeph

bug-ops · 2026-02-26T13:41:47Z

Summary

Introduce llm_client(request_timeout_secs) in zeph-llm::http with 30s connect timeout and configurable request timeout
Switch ClaudeProvider and OpenAiProvider from default_client() (60s) to llm_client(600)
Remove now-unused default_client() from zeph-llm::http
Add TimeoutConfig.llm_request_timeout_secs (default 600s) to [timeouts] config section, exposed via ZEPH_TIMEOUT_LLM_REQUEST
Bootstrap passes config value to providers via with_client(llm_client(...))

Root cause

default_client() applied a 60s request timeout. LLM calls with tool use routinely exceed 60s, triggering CloseNotify + HTTP request failed errors in ACP sessions.

Two-layer timeout design

Layer	Mechanism	Default	Purpose
tokio	`TimeoutConfig.llm_seconds`	120s	agent-level, user-configurable
reqwest	`TimeoutConfig.llm_request_timeout_secs`	600s	HTTP backstop, fires only if tokio layer is bypassed

Test plan

cargo +nightly fmt --check — pass
cargo clippy --workspace -- -D warnings — pass
cargo nextest run --workspace --lib --bins — 2894 tests passed

Claude and OpenAI providers shared the same 60s request timeout as other HTTP clients. LLM responses with tool use can take longer, causing CloseNotify + request failure after exactly 60s. Introduce llm_client() in zeph-llm::http with only a connect timeout (30s). Claude and OpenAI providers now use llm_client(); the unused default_client() in zeph-llm is removed (other crates use zeph_core::http::default_client()).

The previous 60s reqwest timeout caused request failures for long LLM responses (tool use, large context). Removing the timeout entirely would allow runaway requests to hang indefinitely. Set request timeout to 600s as a safety backstop. The agent-level TimeoutConfig.llm_seconds (default 120s) fires first in normal operation; 600s only catches cases where the tokio-layer is bypassed or llm_seconds is set unusually high.

Add TimeoutConfig.llm_request_timeout_secs (default 600s) as the configurable HTTP-layer backstop for LLM provider requests. Exposed via ZEPH_TIMEOUT_LLM_REQUEST env variable. llm_client() now accepts the timeout as a parameter. Bootstrap passes config.timeouts.llm_request_timeout_secs via with_client() to Claude and OpenAI providers; provider::new() retains a hardcoded 600s default for non-bootstrap construction paths (tests, orchestrator).

Summarization paths (maybe_summarize_tool_pair, try_summarize_with_llm, summarize_tool_output) called .chat() directly without any timeout, causing hangs when Claude was slow. Wrap all summarization .chat() futures with tokio::time::timeout(llm_seconds). Add LlmError::Timeout variant used when the deadline is exceeded; summarization callers treat it the same as an LLM error (warn + skip or fallback to truncation).

github-actions bot added bug Something isn't working size/S llm LLM provider related rust dependencies core size/M size/L and removed size/S size/M labels Feb 26, 2026

bug-ops added 6 commits February 26, 2026 15:16

fix(llm): wire llm_client(600) in ClaudeProvider and OpenAiProvider

034315e

chore: update Cargo.lock

41c6fdc

bug-ops force-pushed the fix/llm-client-no-request-timeout branch from dba0ab7 to 9aca28e Compare February 26, 2026 14:16

bug-ops added 2 commits February 26, 2026 15:26

style: fix formatting

4df2efd

fix(tests): add llm_request_timeout_secs to TimeoutConfig initializer

6f378df

github-actions bot added the tests label Feb 26, 2026

bug-ops enabled auto-merge (squash) February 26, 2026 14:58

bug-ops merged commit 2262957 into main Feb 26, 2026
41 of 56 checks passed

bug-ops deleted the fix/llm-client-no-request-timeout branch February 26, 2026 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(llm): remove request timeout from LLM HTTP client#955

fix(llm): remove request timeout from LLM HTTP client#955
bug-ops merged 8 commits intomainfrom
fix/llm-client-no-request-timeout

bug-ops commented Feb 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bug-ops commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Two-layer timeout design

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bug-ops commented Feb 26, 2026 •

edited

Loading