feat(memory): add long conversation mode with asymmetric compression by basnijholt · Pull Request #119 · basnijholt/agent-cli

basnijholt · 2025-11-30T20:57:31Z

Summary

Adds a new --long-conversation mode to memory-proxy that maintains a single, continuous conversation with intelligent compression, optimized for 100-200k token context windows.

Key insight: User input is precious and hard to summarize without loss. LLM output is verbose and derivable. Compress asymmetrically.

Features

Chronological context: Maintains full conversation history as markdown segments
Asymmetric compression: User messages compressed gently (70%), assistant messages aggressively (20%)
Code block deduplication: Detects near-duplicate code blocks and stores compact diffs
Token budget enforcement: Compresses when approaching context limit

Usage

agent-cli memory-proxy \
    --long-conversation \
    --context-budget 150000 \
    --compress-threshold 0.8 \
    --raw-recent-tokens 40000

Implementation

Phase 1: Basic storage, segment persistence, context building
Phase 2: Asymmetric compression with LLM summarization
Phase 3: Code block extraction, similarity detection, diff-based deduplication

Known limitations

Streaming falls back to non-streaming (with warning)
Token estimation uses len(text) // 4 heuristic

Test plan

Unit tests for code block extraction and similarity detection
Integration tests for segment persistence
Integration tests for compression triggers
Integration tests for deduplication
Manual testing with real LLM

Implement chronological context with token budget enforcement for single long-running conversations. This mode maintains conversation history as segments and builds context by including recent turns up to the token budget. New features: - --long-conversation flag for memory proxy command - --context-budget, --compress-threshold, --raw-recent-tokens options - Segment and LongConversation data models - File-based persistence (markdown with YAML frontmatter) - Basic context building with token budget enforcement Phase 2+ (not yet implemented): - Asymmetric compression (user vs assistant) - Code block deduplication - Streaming support

…Phase 2) Implements intelligent compression that prioritizes assistant messages: - User messages: gentle 70% compression, preserve code blocks and quotes - Assistant messages: aggressive 20% compression to bullet points Adds integration tests covering the full transformation pipeline.

… (Phase 3) Add repetition detection that identifies near-duplicate code blocks and stores compact references with diffs instead of full content. - Extract fenced code blocks using regex - Detect similarity using difflib.SequenceMatcher (>85% threshold) - Store reference + unified diff when savings > 70% - Integrate deduplication into segment creation flow - Add 6 integration tests for repetition detection This saves tokens when users paste the same or similar code multiple times during a conversation.

…plication Add two integration tests to verify Phase 2 and Phase 3 features work together: - test_compression_and_deduplication_together: Verifies compression triggers mid-conversation and deduplication still works for repeated content - test_build_context_with_all_segment_states: Verifies build_context correctly handles raw, summarized, and reference segments in the same conversation

basnijholt added 3 commits November 30, 2025 11:58

basnijholt force-pushed the long-convo branch from 8a1a0aa to b72479b Compare December 1, 2025 02:11

basnijholt and others added 3 commits December 1, 2025 02:11

Merge b72479b into 8d3f50a

ad3f9fe

Update README.md

7d7a842

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(memory): add long conversation mode with asymmetric compression#119

feat(memory): add long conversation mode with asymmetric compression#119
basnijholt wants to merge 6 commits intomainfrom
long-convo

basnijholt commented Nov 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant