Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
## [Unreleased]

### Added
- Tool-pair summarization — `maybe_summarize_tool_pair()` summarizes oldest tool call/response pairs when visible count exceeds `tool_call_cutoff` (default 6) (#793)
- XML-delimited prompt in `build_tool_pair_summary_prompt()` to prevent prompt injection from tool output
- `[memory] tool_call_cutoff` config option with validation (`>= 1`)
- Reactive compaction on `ContextLengthExceeded` — auto-compact and retry LLM calls up to 2 times (#792)
- `ContextLengthExceeded` error variant in `LlmError` with provider-specific pattern detection (Claude, OpenAI, Ollama)
- Middle-out progressive tool response removal fallback during summarization (10/20/50/100% tiers)
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Most AI agent frameworks dump every tool description, skill, and raw output into
- **Semantic skill selection** — embeds skills as vectors, retrieves only top-K relevant per query instead of injecting all
- **Smart output filtering** — command-aware filters strip 70-99% of noise before context injection; oversized responses offloaded to filesystem
- **Resilient context compaction** — reactive retry on context overflow, middle-out progressive tool response removal, 9-section structured compaction prompt, LLM-free metadata fallback
- **Tool-pair summarization** — when visible tool call/response pairs exceed a configurable cutoff, the oldest pair is summarized via LLM and originals hidden from context
- **Accurate token counting** — tiktoken-based cl100k_base tokenizer with DashMap cache replaces chars/4 heuristic
- **Proportional budget allocation** — context space distributed by purpose, not arrival order

Expand Down Expand Up @@ -63,7 +64,7 @@ zeph --tui # run with TUI dashboard
|---|---|
| **Hybrid inference** | Ollama, Claude, OpenAI, Candle (GGUF), any OpenAI-compatible API. Multi-model orchestrator with fallback chains. Response cache with blake3 hashing and TTL |
| **Skills-first architecture** | YAML+Markdown skill files with semantic matching, self-learning evolution, 4-tier trust model, and compact prompt mode for small-context models |
| **Semantic memory** | SQLite + Qdrant (or embedded SQLite vector search) with MMR re-ranking, temporal decay scoring, resilient compaction (reactive retry, middle-out tool response removal, 9-section structured prompt, LLM-free fallback), durable compaction with message visibility control, credential scrubbing, cross-session recall, vector retrieval, autosave assistant responses, and snapshot export/import |
| **Semantic memory** | SQLite + Qdrant (or embedded SQLite vector search) with MMR re-ranking, temporal decay scoring, resilient compaction (reactive retry, middle-out tool response removal, 9-section structured prompt, LLM-free fallback), durable compaction with message visibility control, tool-pair summarization (LLM-based, configurable cutoff), credential scrubbing, cross-session recall, vector retrieval, autosave assistant responses, and snapshot export/import |
| **Multi-channel I/O** | CLI, Telegram, Discord, Slack, TUI — all with streaming. Vision and speech-to-text input |
| **Protocols** | MCP client (stdio + HTTP), A2A agent-to-agent communication, ACP server for IDE integration (multi-session, persistence, idle reaper, permission persistence, multi-modal prompts with image forwarding), sub-agent orchestration |
| **Defense-in-depth** | Shell sandbox, tool permissions, secret redaction, SSRF protection, skill trust quarantine, audit logging |
Expand Down
3 changes: 2 additions & 1 deletion crates/zeph-core/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Core orchestration crate for the Zeph agent. Manages the main agent loop, bootst
| `bootstrap` | `AppBuilder` — fluent builder for application startup |
| `channel` | `Channel` trait defining I/O adapters; `LoopbackChannel` / `LoopbackHandle` for headless daemon I/O (`LoopbackHandle` exposes `cancel_signal: Arc<Notify>` for session cancellation); `Attachment` / `AttachmentKind` for multimodal inputs |
| `config` | TOML config with `ZEPH_*` env overrides; typed `ConfigError` (Io, Parse, Validation, Vault) |
| `context` | LLM context assembly from history, skills, memory; resilient compaction with reactive context-overflow retry (max 2 attempts), middle-out progressive tool response removal (10/20/50/100% tiers), 9-section structured compaction prompt, LLM-free metadata fallback via `build_metadata_summary()` with safe UTF-8 truncation; parallel chunked summarization; visibility-aware history loading (agent-only vs user-visible messages); durable compaction via `replace_conversation()`; uses shared `Arc<TokenCounter>` for accurate tiktoken-based budget tracking |
| `context` | LLM context assembly from history, skills, memory; resilient compaction with reactive context-overflow retry (max 2 attempts), middle-out progressive tool response removal (10/20/50/100% tiers), 9-section structured compaction prompt, LLM-free metadata fallback via `build_metadata_summary()` with safe UTF-8 truncation; parallel chunked summarization; tool-pair summarization via `maybe_summarize_tool_pair()` — when visible pairs exceed `tool_call_cutoff`, oldest pair is LLM-summarized with XML-delimited prompt and originals hidden via `agent_visible=false`; visibility-aware history loading (agent-only vs user-visible messages); durable compaction via `replace_conversation()`; uses shared `Arc<TokenCounter>` for accurate tiktoken-based budget tracking |
| `cost` | Token cost tracking and budgeting |
| `daemon` | Background daemon mode with PID file lifecycle (optional feature) |
| `metrics` | Runtime metrics collection |
Expand Down Expand Up @@ -57,6 +57,7 @@ Key `MemoryConfig` fields (TOML section `[memory]`):
| `redact_credentials` | bool | `true` | Scrub secrets and paths before LLM context injection |
| `autosave_assistant` | bool | `false` | Persist assistant responses to semantic memory automatically |
| `autosave_min_length` | usize | `20` | Minimum response length (chars) to trigger autosave |
| `tool_call_cutoff` | usize | `6` | Max visible tool call/response pairs before oldest is summarized via LLM |

```toml
[agent]
Expand Down
6 changes: 6 additions & 0 deletions crates/zeph-core/src/agent/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,12 @@ impl<C: Channel> Agent<C> {
self
}

#[must_use]
pub fn with_tool_call_cutoff(mut self, cutoff: usize) -> Self {
self.memory_state.tool_call_cutoff = cutoff;
self
}

#[must_use]
pub fn with_response_cache(
mut self,
Expand Down
Loading
Loading