External working memory for AI coding assistants
Recall is a long-term semantic memory system that addresses context window limitations in AI coding assistants like Claude Code. It provides dual-mode retrieval (semantic + episodic) through vector embeddings, enabling persistent memory across sessions without cloud dependencies.
The Problem: AI coding assistants have limited context windows (typically 200k tokens). Important decisions, discoveries, and technical details get lost when context fills up or sessions restart.
The Solution: Recall acts as external working memory - store important events immediately, retrieve them on-demand by meaning OR time, and maintain continuity across sessions.
β Session Continuity - Resume work after restart without re-explaining context β Context Pressure Relief - Offload details to Recall, keep active reasoning lightweight β Timeline Reconstruction - Query "What happened on October 10th?" chronologically β Decision Consistency - Reference past architectural decisions for consistency β Zero Cloud Dependencies - Fully local, no API keys required
Unlike "shadow agent" approaches that spawn secondary AI instances to observe your sessions, Recall uses an O(1) cost model - you only pay for tokens when you explicitly store or retrieve memories.
| Approach | Token Cost | Session Impact |
|---|---|---|
| Recall (Explicit) | O(1) - per tool call | Zero overhead during work |
| Shadow Agent (Automatic) | O(N) - re-reads entire context | 2-3x session cost |
Result: Recall is economically viable for heavy daily use. Shadow agent approaches can double or triple your token consumption.
Recall captures outcomes, not process:
β Automatic capture: "Tried fix A... failed. Tried fix B... failed. Tried fix C... worked."
β
Recall explicit: "Fixed race condition in auth module by adding mutex lock"
When you retrieve memories later, you get actionable solutions - not debugging noise.
| Feature | Recall | Complex Alternatives |
|---|---|---|
| Dependencies | Python + FastMCP + Qdrant | TypeScript + Bun + PM2 + SQLite + Chroma |
| Client Portability | Any MCP client (CLI, Desktop, IDEs) | Often CLI-only (hook dependencies) |
| Stability | Pure MCP (stable protocol) | Hook chains (version-sensitive) |
| Maintenance | Single Python codebase | Multi-language stack |
You control exactly what gets stored. No automatic surveillance of your coding sessions:
- β Store only what matters (decisions, discoveries, milestones)
- β
Skip sensitive work with
<private>tags - β No background processes watching your context
- β Full audit trail of what you've stored
Recall's multi-collection routing prevents dimension mismatch errors when switching embedding models:
384d collection β all-MiniLM-L6-v2, bge-small-en-v1.5
768d collection β snowflake-arctic-embed-m, nomic-embed-text-v1.5
Switch models freely - Recall routes automatically to the correct collection.
- Python 3.10+
- Virtual environment (REQUIRED for modern Python - see INSTALLATION.md for PEP 668 details)
- Claude Code CLI or compatible MCP client
- Docker (optional, required for network mode multi-project support)
π Detailed Installation Guide: See INSTALLATION.md for platform-specific instructions, troubleshooting, and common issues.
π€ AI Agents: If you're Claude or another AI assistant asked to install Recall, see the AI Agent Installation Guide for step-by-step instructions including user action prompts.
β οΈ Known Issue: Claude Code's plugin system may not automatically configure the MCP server. If you encounter "Failed to reconnect to plugin:recall:recall" after installation, see the comprehensive Plugin Installation Troubleshooting guide for manual configuration steps.
Four-step installation - works from anywhere:
# 1. Add Recall as a plugin marketplace
/plugin marketplace add WKassebaum/Recall
# 2. Install the Recall plugin
/plugin install recall@Recall
# 3. Configure storage and verify
# If automatic setup succeeds, verify with:
/mcp # Should show: plugin:recall:recall with 3 tools
# If you see "Failed to reconnect":
# See INSTALLATION.md for manual configurationIf Plugin Installation Fails:
The plugin system may not automatically:
- Create virtual environment
- Install dependencies
- Register MCP server in
~/.claude.json
Manual Configuration Required:
- Create virtual environment and install dependencies
- Add MCP server configuration to
~/.claude.jsonwith namespaceplugin:recall:recall - Use absolute paths (not template variables)
Detailed Guide: See INSTALLATION.md - Plugin Installation Troubleshooting for step-by-step instructions.
After Successful Installation:
# Configure Qdrant storage mode
recall setup # Choose embedded or network mode
# Restart Claude Code
# Cmd/Ctrl + Q, then relaunch
# Verify installation
/mcp # Should show: plugin:recall:recall with 3 toolsWhat you get (when working):
- β MCP server with 3 tools (ingest_memory, recall_memory, memory_stats)
- β Choice of storage modes (embedded or network Docker)
- β Multi-project support (network mode)
- β All data stored locally (no cloud dependencies)
First Launch Note: On first use, sentence-transformers will automatically download the Arctic embedding model (~3.5GB) from HuggingFace to ~/.cache/huggingface/. This takes 30-60 seconds on a good connection. Subsequent launches are instant.
# Create a virtual environment
python -m venv ~/recall-venv
source ~/recall-venv/bin/activate # On Windows: ~/recall-venv\Scripts\activate
# Install from PyPI
pip install semantic-recall
# Find your Python path for MCP configuration
which python # e.g., /Users/username/recall-venv/bin/pythonThen add to Claude Code:
claude mcp add recall -s user -- /path/to/recall-venv/bin/python -m recall.mcp.serverIf you prefer source installation or want to contribute:
# Clone repository
git clone https://github.com/WKassebaum/Recall.git
cd Recall
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install from source
pip install -e .First Launch Note: On first use, sentence-transformers will automatically download the Arctic embedding model (~3.5GB) from HuggingFace to ~/.cache/huggingface/. This takes 30-60 seconds on a good connection. Subsequent launches are instant.
Optional - Pre-download model to avoid delays:
python -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('Snowflake/snowflake-arctic-embed-m')"Add Recall to your Claude Code configuration:
claude mcp add-json --scope user recall '{
"command": "/path/to/Recall/venv/bin/python",
"args": ["-m", "recall.mcp.server"],
"env": {
"PYTHONPATH": "/path/to/Recall/src"
}
}'Restart Claude Code to load the Recall MCP server.
# In Claude Code, use the Recall tools
mcp__recall__memory_stats()
# Should show: Embedder: snowflake-arctic-embed-m, Collection: recall_768dImportant Decision: Recall supports two storage modes with different capabilities:
| Feature | Embedded (Local) | Network (Docker) |
|---|---|---|
| Multi-project | β ONE AT A TIME | β Unlimited concurrent |
| Multi-window | β File locking issues | β Thread-safe |
| Setup | β Zero-config | |
| Performance | β Slightly faster | β Fast enough (<20ms) |
| Data location | ~/.recall/qdrant/ |
Docker volume |
| Recommended for | Single-project testing | Production use |
Choose Embedded Mode if:
- β You only work on ONE project at a time
- β You never open multiple Claude Code windows simultaneously
- β You want zero-setup simplicity
Choose Network Mode (Docker) if:
- β You work on multiple projects concurrently
- β You open multiple Claude Code windows
- β You want thread-safe, scalable storage
- β Recommended for normal usage
Run the setup wizard to configure your preferred mode:
recall setupThe wizard will:
- β Detect your system (Python, Docker, existing Qdrant)
- β Explain mode limitations and benefits
- β Help you choose the right mode
- β Create Docker Qdrant instance (network mode)
- β Test connection before saving
- β
Generate configuration at
~/.recall/.env
Reconfigure anytime:
recall setup --reconfigureIf you're already using Recall in embedded mode and want multi-project support immediately:
See: QUICK_FIX_MULTI_PROJECT.md for step-by-step workaround
TL;DR:
- Start Docker Qdrant:
docker run -d --name recall-qdrant -p 6333:6333 qdrant/qdrant:latest - Create
~/.recall/.envwithRECALL_QDRANT_MODE=network - Restart Claude Code
Safe migration script available: Use scripts/migrate-to-named-volumes.sh to safely transfer data from bind mounts to named volumes without data loss. See TEAM_ROLLOUT_GUIDE.md for details.
Recall v1.4.0 eliminates Docker corruption issues on macOS and provides automated backup/recovery:
β Docker Reliability Improvements:
- Named volumes - Docker-managed storage eliminates macOS file descriptor translation issues
- WAL tuning - Optimized for batched writes (512MB buffer, 30s flush intervals)
- Health checks - Automatic corruption detection
- Multi-project validated - Stable with 4+ concurrent projects
- Zero corruption - No data loss since implementation
β Automated Backup System (macOS):
- Every 6 hours - Automated backups via launchd
- Intelligent rotation - 4 recent (24hrs), 7 daily, 4 weekly
- Auto-cleanup - Old backups automatically pruned
- 6-hour maximum data loss - Down from total loss before
β Auto-Recovery System:
# One-command health check and recovery
recall recover
# Force recovery from latest backup
recall recover --force
# Recover from specific backup
recall recover --backup backups/recall-backup-20251018.tar.gzRecovery time: 2-3 minutes (fully automated)
1. Install automated backups:
./scripts/setup-auto-backup.sh2. Verify service:
launchctl list | grep recall.backup
# Expected: - 0 com.recall.backup3. Test recovery:
recall recover
# Should show: β
All health checks passed!- macOS: β Fully automated (launchd)
- Linux: β Easy (cron, 10 min setup) - See CROSS_PLATFORM_BACKUP_GUIDE.md
- Windows: β WSL2 recommended (use Linux approach)
- DOCKER_RELIABILITY.md - Comprehensive troubleshooting and root cause analysis
- AUTOMATED_BACKUP_RECOVERY_GUIDE.md - Complete user guide
- CROSS_PLATFORM_BACKUP_GUIDE.md - Linux/Windows setup
- TEAM_ROLLOUT_GUIDE.md - Migration strategy (NO data wipe needed)
Store important events with structured metadata:
mcp__recall__ingest_memory(
content="Selected Arctic embedder after benchmark showing 93.3% accuracy",
session_id="architecture_decisions",
metadata={
"event_type": "decision",
"tags": "architecture,embeddings,performance",
"context": "Comparing 4 embedding models",
"outcome": "Arctic selected as primary"
}
)Event Types: decision, discovery, milestone, preference, error, success
Semantic Search (by meaning):
mcp__recall__recall_memory(
query="embedding model decisions",
top_k=5,
session_id="architecture_decisions"
)
# Returns: Most semantically relevant memoriesChronological Timeline (by time):
mcp__recall__recall_memory(
retrieval_mode="chronological",
session_id="phase3",
time_range="2025-10-08,2025-10-11"
)
# Returns: Memories in time order (oldest β newest)Hybrid (semantic + temporal + event filters):
mcp__recall__recall_memory(
query="debugging attempts",
retrieval_mode="hybrid",
time_range="2025-10-10,", # Since Oct 10
event_types="discovery,error,success",
top_k=10
)
# Returns: Relevant debugging events from time rangeSemantic Mode - Search by meaning using vector similarity
- Query: "What architecture decisions did we make?"
- Result: Top matches ranked by relevance score
Chronological Mode - Search by time range and filters
- Query: "Show me Phase 3 timeline"
- Result: Events in time order (oldest to newest)
Hybrid Mode - Combine semantic + temporal + event filtering
- Query: "Recent MCP debugging discoveries"
- Result: Semantically relevant events within time range
Organize memories by type for targeted retrieval:
| Event Type | Use Case | Example |
|---|---|---|
decision |
Architecture, tool selection | "Chose multi-collection strategy for dimension isolation" |
discovery |
Bug findings, insights | "Found stdout contamination corrupting JSON-RPC" |
milestone |
Waypoint completions | "Completed Phase 3 with 91.94% test coverage" |
preference |
User patterns, coding style | "User prefers async/await over callbacks" |
error |
Problems encountered | "Migration failed: dimension mismatch" |
success |
Solutions that worked | "Fixed timezone bug with datetime.max.replace()" |
- Semantic search: ~17.5ms average (28x faster than 500ms target)
- Chronological search: ~20-30ms (no embedding generation)
- Hybrid search: ~25-40ms (embedding + filtering)
Primary (default):
snowflake/arctic-embed-m- 87% accuracy, 768D, ~3.5GB- Purpose-built for retrieval tasks
- SOTA performance, excellent on M1 Max (~35ms/query)
Fallback:
all-MiniLM-L6-v2- 78.1% accuracy, 384D, ~1.2GB- Smallest, most reliable fallback (~14.7ms/query)
- Auto-activates if Arctic fails to load
User-selectable (via config.yaml):
nomic-embed-text-v1.5- 86.2% accuracy, 768D, supports 8K token contextbge-small-en-v1.5- 84.7% accuracy, 384D, balanced performance
All models run excellently on M1 Max (use <8% of 64GB RAM).
Progressive Disclosure Teaching System - Recall now includes Claude Skills support for enhanced discoverability and guided usage.
Skills are teaching documentation that help Claude understand when and how to use tools effectively. Instead of loading full documentation into every conversation (~500+ tokens), Skills use progressive disclosure:
- Idle state: ~20 tokens (metadata only)
- When needed: Full SKILL.md loaded on-demand
- Additional context: Real examples from production use
After installing Recall, you automatically get:
π ~/.claude/skills/recall-memory-skill/
SKILL.md- Comprehensive 400+ line usage guide- When to Use Recall (auto-trigger patterns)
- Available MCP Tools documentation
- Event Types and Search Strategies
- Context Management workflows
- Integration patterns
examples.md- Real usage examples- 8 comprehensive examples (debugging timelines, decision tracking, performance optimization)
- Anti-patterns to avoid
- Token efficiency analysis
The skill teaches Claude:
- Auto-trigger scenarios - When to proactively use Recall (context >70%, milestones, bugs, decisions)
- Event type selection - Choose correct type (decision, discovery, milestone, success, error, preference)
- Search strategies - Semantic vs chronological vs hybrid modes
- Context management - When to offload details to free working memory
- Workflow patterns - Session continuity, debugging timelines, decision tracking
β Better Claude understanding - Claude knows when/how to use Recall without explicit reminders β Token efficiency - ~97% reduction (20 tokens idle vs 500+ always-loaded) β Progressive disclosure - Detailed docs loaded only when needed β Real examples - Learn from actual Recall development patterns
If using manual installation (not plugin), create the skill directory:
mkdir -p ~/.claude/skills/recall-memory-skill/
cp .claude-plugin/skills/* ~/.claude/skills/recall-memory-skill/Claude Code will automatically discover and load the skill on next launch.
β All Core Features Validated
- Event metadata storage β
- Semantic mode β
- Chronological mode β
- Event type filtering β
- Hybrid mode β
- Time range filtering β
β Quality Gates Passed
- Test coverage: 80.03% (target: >80%)
- Cyclomatic complexity: β€8 (target: β€10)
- Type safety: mypy strict passing
- Code quality: ruff passing
- Zero breaking changes
β Performance Validated
- Query latency: <500ms target met (17.5ms average)
- Memory usage: <8GB on M1 Max
- Throughput: 32.4 chunks/sec
Comprehensive test suite with 17 quality waypoints:
# Run full test suite
pytest tests/
# Run with coverage
pytest --cov=src/recall --cov-report=html tests/
# Run specific test categories
pytest tests/unit/
pytest tests/integration/
pytest tests/benchmark/- CLAUDE.md - Comprehensive usage guide for Claude Code (350+ lines)
- Auto-trigger patterns
- Event metadata best practices
- Workflow integration patterns
- Context management strategy
- docs/architecture/ - Architecture and technical analysis
- docs/development/ - Development plans, testing, quality gates
- docs/planning/ - PRD, executive summaries, Zen validation
- docs/validation/ - Test reports and validation results
- docs/releases/RELEASE_NOTES_v1.3.2.md - v1.3.2 feature overview
- docs/validation/VALIDATION_REPORT_v1.3.2.md - Comprehensive validation report
MCP Client (Claude Code CLI)
β (tool calls via MCP)
MCP Server (FastMCP)
β
Core Engine
ββ Chunker (TreeSitter AST parser for 39+ languages)
ββ Embedder Factory (Arctic with MiniLM fallback)
ββ UnifiedVectorStore
β
Qdrant Vector Database
ββ recall_384d (384-dimension: all-MiniLM-L6-v2, bge-small-en-v1.5)
ββ recall_768d (768-dimension: snowflake-arctic-embed-m, nomic-embed-text-v1.5)
Key Design Decisions:
- Dual Storage Modes - Embedded (simple, single-project) or Network (Docker, multi-project)
- Multi-collection strategy - Separate collections per embedding dimension (prevents dimension mismatch errors)
- Unified API - Automatic routing to correct collection based on active embedder
- 2-tier fallback - Arctic (primary) β MiniLM (fallback) for reliability
- Hybrid architecture - Single storage (vector DB), dual retrieval (semantic OR temporal)
- Environment-first config -
.envfiles take precedence for flexible deployment
Recall uses ~/.recall/.env for runtime configuration (automatically created by recall setup):
# Qdrant Storage Mode
RECALL_QDRANT_MODE=network # or "embedded"
# Network Mode Settings (Docker)
RECALL_QDRANT_HOST=localhost
RECALL_QDRANT_PORT=6333
# RECALL_QDRANT_API_KEY= # Optional for Qdrant Cloud
# Embedded Mode Settings
# RECALL_QDRANT_PATH=~/.recall/qdrant/
# Embedder Settings
RECALL_EMBEDDER_MODEL=Snowflake/snowflake-arctic-embed-m
RECALL_FALLBACK_ENABLED=true
RECALL_FALLBACK_MODEL=all-MiniLM-L6-v2Reconfigure: Run recall setup --reconfigure to update settings interactively.
Manual editing: Edit ~/.recall/.env directly, then restart Claude Code.
For advanced customization, edit config.yaml:
# PRIMARY MODEL (default)
embedder: snowflake/arctic-embed-m
# 2-TIER FALLBACK
fallback:
enabled: true
model: all-MiniLM-L6-v2
# QDRANT CONFIGURATION
qdrant:
host: localhost
port: 6333
# api_key: optional
# MULTI-COLLECTION (auto-managed)
collections:
auto_create: true
# recall_384d (384D models), recall_768d (768D models) created automatically
# MONITORING
monitoring:
alert_on_fallback: true
log_dimension_mismatches: trueNote: Environment variables in ~/.recall/.env take precedence over config.yaml.
We welcome contributions! Please see:
# Install development dependencies
pip install -r requirements-dev.txt
# Run quality checks
./scripts/quality_check.sh
# Run tests with coverage
pytest --cov=src/recall --cov-fail-under=80 tests/
# Check cyclomatic complexity
radon cc src/recall/ -n C -s
# Type checking
mypy src/recall --strictFocus: Test Quality & PyPI Publishing
-
β Test Coverage Improvements
- Expanded test suite from 54.77% to 80.03% coverage
- 236 total tests covering all core modules
- CLI modules fully tested (cleanup, doctor, recover, setup)
- Core modules tested (store, embedders, backends)
- Integration tests stabilized with proper database isolation
-
β PyPI Package Preparation
- Package builds verified (wheel + sdist)
- Version synchronization across all files
- Ready for PyPI publishing
Focus: Context Management & User Experience
-
Context Size Monitoring & Alerts β High Priority
- Real-time context window usage tracking
- Smart alerts when context reaches 70%+ capacity
- Automatic suggestions for memories to offload
- Integration with Claude Code status bar
-
Memory Importance Scoring
- Automatic importance calculation based on access patterns
- User-adjustable importance ratings
- Priority-based retrieval ranking
- Intelligent memory pruning suggestions
-
Cross-Project Memory Sharing
- Share memories across multiple projects
- Global vs project-scoped memory management
- Shared decision/preference memory pools
-
Smart Mode Selection
- Automatic mode detection (semantic vs chronological vs hybrid)
- Query pattern analysis for optimal retrieval
- User preference learning
-
Memory Export/Import
- JSON/YAML export formats
- Backup and restore functionality
- Team knowledge sharing capabilities
- Migration between instances
Focus: Automation & Intelligence
-
Auto-Ingestion Hooks
- Automatic memory capture at key events
- Git commit integration (capture commit context)
- Test failure auto-logging
- Configurable trigger patterns
-
Smart Suggestions
- Proactive memory recommendations during coding
- "You worked on similar code last week" notifications
- Related decision surfacing
- Pattern-based insight generation
-
Natural Language Queries
- Conversational query interface
- Query intent understanding
- Multi-step query refinement
-
Performance Dashboard
- Real-time system metrics visualization
- Query latency trends
- Storage usage analytics
- Retrieval accuracy reporting
-
Session Recap (formerly "Summarization")
- Conversational timeline queries: "What did we do last week?"
- Decision history: "When did we move to version x?"
- Rationale retrieval: "Why did we switch to Apache 2.0?"
- Intelligent event grouping and presentation
- No data compression - full context preserved
Focus: Enterprise & Scale
- Multi-user support with permissions
- Distributed Qdrant deployment
- Advanced query DSL for power users
- Memory analytics and insights
- API for third-party integrations
Apache 2.0
- CodeIndex - AST chunking patterns and TreeSitter integration
- Qdrant - High-performance vector database
- FastMCP - MCP server framework
- Snowflake, Nomic AI, BAAI - Embedding models
- Claude Code - Dogfooding and validation
- Issues: GitHub Issues
- Documentation: docs/
- Discussions: GitHub Discussions
Version: v1.4.1 | Status: Production-ready | Last Updated: 2025-12-07