Give your AI agent humanlike long-term memory with hybrid search β no RAG pipelines, no databases, no infrastructure.
Two background agents (Observer + Reflector) compress your conversation history into dense, prioritized memory files. QMD hybrid search (BM25 + vectors + reranking) makes those compressed memories instantly findable. Your agent reads them on startup and has full context about you, your projects, your preferences, and what happened while it was "asleep."
The compressed observations achieve 5β40Γ token reduction while preserving what matters β and they're actually better search targets than raw conversation. Dense, pre-scored notes produce higher-precision results than searching through thousands of noisy messages.
Inspired by Mastra's Observational Memory (SOTA on LongMemEval) β adapted and extended with hybrid search for the OpenClaw ecosystem.
The system has two layers: compression (Observer + Reflector) and retrieval (QMD hybrid search). Together they solve the full memory problem β writing dense memories and finding them when you need them.
Conversation Observer Memory Files QMD Hybrid Search
(raw messages) (every 30m) (compressed) (BM25 + vectors + reranking)
ββββββββββββββββ βββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Hey, can you β β β β π΄ 14:30 Userβ β β
β help me set β β Compress β β setting up β β BM25 index β
β up Postgres ββββΊβ & score ββββββΊβ PostgreSQL βββββββΊβ Vector embed β
β for the new β β prioritiesβ β for project β β LLM reranker β
β project?... β β β β "Atlas" β β β
β [200+ msgs] β βββββββββββββ ββββββββ¬ββββββββ ββββββββ¬ββββββββ
ββββββββββββββββ β β
β β
~50K tokens/day Reflector ββββββββΌββββββββ memory_search
(daily) β ## Identity β "What was that
βββββββββββββ β Name: Alex β Postgres decision?"
β Condense ββββββΊβ ## Projects β β
β & merge β β Atlas: PG β ββββββββΌββββββββ
βββββββββββββ β ## Prefs β β Top results β
β π΄ Postgres β β with citationsβ
ββββββββββββββββ ββββββββββββββββ
~500 tokens total
Three tiers of memory, each more compressed than the last β all searchable via QMD:
| Tier | Updated | Retention | Size | Contents |
|---|---|---|---|---|
| Raw Messages | Real-time | Session only | ~50K tokens/day | Full conversation |
| Observations | Every 30 min | 7 days | ~2K tokens/day | Timestamped, prioritized notes |
| Reflections | Daily | Indefinite | 200β600 lines total | Stable identity, projects, preferences |
Most memory systems choose between compression (summaries) and retrieval (RAG). This system does both, and the combination is better than either alone:
- Compressed observations are better search targets. Stripping filler and scoring by priority means QMD searches through signal, not noise.
- BM25 catches what vectors miss. Project names, error codes, API endpoints, specific tools β exact-match search finds these instantly. Vector search alone often can't.
- Vectors catch what BM25 misses. "That database discussion last week" finds your PostgreSQL migration notes even though the word "database" never appears in them.
- Local reranking scores results by actual relevance, not just keyword or embedding similarity.
# Observations
## 2026-02-10
### Current Context
- **Active task:** Migrating Atlas project from SQLite to PostgreSQL
- **Mood/tone:** Focused, slightly frustrated with connection pooling
- **Key entities:** Atlas, PostgreSQL, PgBouncer, Render.com
- **Suggested next:** Help verify connection pool settings work under load
### Observations
- π΄ 14:30 User is migrating the Atlas project from SQLite to PostgreSQL
- π΄ 14:30 Reason: SQLite can't handle the concurrent writes they need
- π‘ 14:35 Using Render.com managed PostgreSQL instance
- π΄ 14:42 User prefers PostgreSQL over SQLite for production workloads
- π‘ 14:45 Debugging connection pool exhaustion β PgBouncer max_client_conn was set too low
- π‘ 14:52 Resolved: increased to 200 connections, switched to transaction mode
- π΄ 15:10 User wants to add full-text search to Atlas
- π‘ 15:10 Considering pg_trgm vs tsvector β leaning toward tsvector# Reflections β Long-Term Memory
*Last updated: 2026-02-10 04:00 UTC*
*Last reflected: 2026-02-10 15:10 UTC*
## Core Identity
- **Name:** Alex Chen
- **Role:** Backend engineer at a Series B startup
- **Communication style:** Direct, technical, appreciates concise answers
- **Working hours:** ~09:00β18:00 PST, occasional evening sessions
- **Preferences:** PostgreSQL, Python, FastAPI, prefers CLI over GUI
## Active Projects
### Atlas
- **Status:** Active
- **Started:** ~Jan 2026
- **Stack:** Python, FastAPI, PostgreSQL (migrated from SQLite ~Feb 2026)
- **Key decisions:** PostgreSQL for concurrency; PgBouncer in transaction mode; tsvector for search
- **Current state:** Database migration complete, adding full-text search
## Preferences & Opinions
- π΄ PostgreSQL over SQLite for anything production
- π΄ Prefers code examples over explanations
- π‘ Interested in PgBouncer vs pgpool β chose PgBouncer for simplicity- OpenClaw installed and running
openclawCLI available in your PATH
git clone https://github.com/intertwine/openclaw-observational-memory.git
cd openclaw-observational-memory
bash scripts/install.shThis will:
- Create
memory/observations.mdandmemory/reflections.mdin your workspace - Set up two cron jobs: Observer (every 30 min) and Reflector (daily at 04:00 UTC)
bash scripts/enable-qmd.shThis installs QMD and configures OpenClaw to use hybrid search (BM25 + vectors + reranking) over your memory files. QMD auto-indexes observations, reflections, and daily memory files every 5 minutes.
Resource requirements:
- Full setup: ~2 GB RAM for local GGUF models (embedding + reranking)
- Lighter setup: BM25 keyword search works with zero extra RAM; vector embeddings fall back to OpenAI API if local models can't load
- Disk: ~1 GB for model files on first run
To disable: bash scripts/enable-qmd.sh --disable
# Custom model
bash scripts/install.sh --model anthropic/claude-sonnet-4-20250514
# Custom schedule
bash scripts/install.sh --observer-interval "*/30 * * * *" # every 30 min
bash scripts/install.sh --reflector-schedule "0 6 * * *" # 06:00 UTC daily
# Uninstall
bash scripts/uninstall.sh
bash scripts/uninstall.sh --purge # also removes memory filesAdd these lines to your AGENTS.md (or equivalent startup instructions):
## Every Session
...
5. Read `memory/observations.md` β recent compressed observations (auto-maintained by Observer)
6. Read `memory/reflections.md` β long-term condensed memory (auto-maintained by Reflector)That's it. Your agent now has persistent, compressed, searchable memory.
If you're using Claude Code or Codex CLI and want to run the Observer/Reflector outside of OpenClaw, the companion observational-memory Python package provides a standalone CLI:
# Install
pip install observational-memory
# or: uv tool install observational-memory
# Run observer on all recent transcripts
om observe
# Run reflector to condense observations into reflections
om reflect
# Backfill from historical transcripts
om backfill
# Search memory
om search "postgres migration"
# Show status
om statusThe om CLI includes:
- Transcript parsing for Claude Code (
.jsonl) and Codex sessions - Backfill β process all historical transcripts in one command
- Incremental reflection β only processes new observations since
Last reflectedtimestamp, with automatic chunking for large inputs - Pluggable search β BM25 (default), QMD (hybrid), or none
- Session hooks β
om contextfor automatic memory injection at session start
See the observational-memory repo for full documentation.
βββββββββββββββββββ every 30 min ββββββββββββββββββββ
β Main Agent β βββ reads βββββ β Observer Agent β
β Session β β (cron, isolated) β
ββββββββββ¬βββββββββ ββββββββββ¬βββββββββββ
β β writes
β reads on startup βββββββββΌβββββββββββ
β β memory/ β
β β observations.md ββββββ QMD indexes
ββββββββββββββββββββββββββββΊβ reflections.md β (BM25 + vectors
βββββββββ¬βββββββββββ + reranking)
β reads + trims β
βββββββββΌβββββββββββ β
β Reflector Agent β memory_search
β (daily cron) β finds relevant
ββββββββββββββββββββ memories on demand
- Runs as an OpenClaw cron job (default: every 30 minutes)
- Reads recent session history from the main agent session
- Compresses unprocessed messages into timestamped, prioritized notes
- Appends to
memory/observations.mdβ maintains exactly one### Observationsblock per day - Maintains a "Current Context" block with active tasks, mood, and suggested next actions
- Filters out noise: heartbeat polls, system messages, cron notifications, and duplicate entries
- Runs daily (default: 04:00 UTC)
- Incremental updates only β reads observations from
Last reflecteddate onward, not the entire file - Merges new information into the existing reflections document via surgical edits (add, update, promote, demote, archive)
- Overwrites
memory/reflections.mdwith updatedLast updatedandLast reflectedtimestamps - Trims observations older than 7 days
- When observations are too large for a single pass (e.g., after backfill), automatically chunks by date section and folds incrementally
- QMD indexes all memory files automatically (5-minute refresh)
- BM25 catches exact matches: project names, error codes, tool names, URLs
- Vector search catches semantic matches: "that auth issue" finds your OAuth debugging notes
- LLM reranker scores results by actual relevance
- Falls back gracefully: if QMD is unavailable, OpenClaw uses its built-in vector search
| Level | Meaning | Examples | Retention |
|---|---|---|---|
| π΄ | Important / persistent | User facts, decisions, project architecture | Months+ |
| π‘ | Contextual | Current tasks, in-progress work, open questions | Daysβweeks |
| π’ | Minor / transient | Greetings, routine checks, small talk | Hours |
Edit the prompts in reference/ to adjust:
- What gets captured β modify the priority definitions in
observer-prompt.md - What gets filtered β the Observer has a "Never Log" list (heartbeats, cron notifications, etc.)
- How aggressively observations are merged β adjust the merge/promote/archive rules in
reflector-prompt.md - Target size β the reflector aims for 200β600 lines; change this in the prompt
openclaw cron list # see current jobs
openclaw cron edit observer-memory # modify observer schedule
openclaw cron edit reflector-memory # modify reflector scheduleopenclaw cron trigger observer-memory # run observer now
openclaw cron trigger reflector-memory # run reflector nowThe install script defaults to anthropic/claude-sonnet-4-20250514. Both agents work well with any capable model. Sonnet-class or better is recommended for the observer. The reflector benefits from stronger reasoning for complex merging, but works well with smaller models too (we run ours on Kimi K2.5, free tier).
openclaw-observational-memory/
βββ README.md # This file
βββ CLAUDE.md # Claude Code guidance
βββ LICENSE # MIT
βββ SKILL.md # OpenClaw skill integration guide
βββ docs/
β βββ code-and-context-article.md
βββ reference/
β βββ observer-prompt.md # System prompt for the Observer agent
β βββ reflector-prompt.md# System prompt for the Reflector agent
βββ scripts/
βββ install.sh # Automated setup (Observer + Reflector)
βββ uninstall.sh # Clean removal
βββ enable-qmd.sh # Enable QMD hybrid search
Q: Do I need QMD? A: Observational memory works well standalone β your agent reads the compressed files on startup and has full context. QMD adds the ability to search across weeks or months of observations when the agent needs to recall something specific. For agents with long histories or many projects, hybrid search makes a real difference.
Q: Does this replace RAG? A: For personal assistant memory, yes. Observational memory + QMD handles remembering about a person (preferences, projects, communication style) better than traditional RAG. For searching large document collections, RAG is still the right tool.
Q: How much does it cost to run? A: The observer processes only new messages each run (~100β500 input tokens typical). The reflector reads more but runs only once daily, and only processes observations since its last run. Expect ~$0.05β0.20/day with Sonnet-class models, or $0 with free-tier models like Kimi K2.5. QMD runs locally with no API costs.
Q: What if the observer misses something?
A: The observer errs on the side of keeping observations. The reflector handles cleanup. You can also manually edit memory/observations.md at any time.
Q: What about large observation histories? A: The reflector uses incremental updates β it reads its own previous output as a stable base and only processes new observations since its last run. This keeps input bounded regardless of total history size, preventing quality degradation from large inputs.
Q: What happens if the reflector runs on a huge backlog?
A: The reflector's Last reflected timestamp ensures it only processes new observations during normal operation. If the timestamp is missing (first run or after a backfill), the reflector automatically chunks observations by date section and folds them incrementally, preventing the model from being overwhelmed.
Q: Can I use this outside OpenClaw?
A: The prompts are generic and work with any agent framework that supports cron-like scheduling and file-based memory. The install script is OpenClaw-specific, but the pattern is portable. See observational-memory for a standalone CLI targeting Claude Code and Codex.
- Inspired by Mastra's Observational Memory β the original OM pattern that achieved SOTA on LongMemEval
- Hybrid search powered by QMD by Tobi LΓΌtke β local-first BM25 + vectors + reranking
- Built for the OpenClaw community
- License: MIT β fork it, customize it, ship it
Made with π§ by Intertwine