4-operation API. Deferred enrichment. Minimal hot-path cost.
Your agent remembers, learns from outcomes, and predicts what you need next.
Most memory layers are glorified vector stores. Store text, retrieve text. Your agent is still stateless — it doesn't learn, doesn't track what worked, doesn't warn you when something is regressing.
Dhee is a cognition layer. It gives any agent — Claude, GPT, Gemini, custom — four capabilities that turn it into a self-improving HyperAgent:
| Capability | What Dhee does | What your agent gets |
|---|---|---|
| Persistent memory | Stores facts with echo-augmented retrieval (paraphrases, keywords, question-forms) | "What theme does the user prefer?" matches "User likes dark mode" even though the words are different |
| Performance tracking | Records task outcomes, detects trends automatically | Knows it's regressing on code reviews, warns you before you notice |
| Insight synthesis | Extracts causal hypotheses from outcomes — not raw data, synthesized learnings | "What worked: checking git blame first" transfers to the next bug fix |
| Prospective memory | Stores future triggers — "remember to X when Y" | Surfaces intentions when the trigger context matches |
Dhee is being evaluated on LongMemEval, the standard benchmark for long-term conversational memory — temporal reasoning, multi-session aggregation, knowledge updates, and counterfactual tracking across 500+ questions. Preliminary results are promising.
Full methodology and results will be published in the benchmark report.
Dhee is experimental software under active development. The core 4-operation API (remember/recall/context/checkpoint) is stable. Advanced subsystems (belief tracking, policy extraction, episodic indexing) are functional but evolving.
Use it. Build on it. But know that internals will change.
pip install dhee[openai,mcp]
export OPENAI_API_KEY=sk-...{
"mcpServers": {
"dhee": { "command": "dhee-mcp" }
}
}Your agent now has 4 tools. It will use them automatically.
from dhee import Dhee
d = Dhee()
d.remember("User prefers dark mode")
d.recall("what theme does the user like?")
d.context("fixing auth bug")
d.checkpoint("Fixed it", what_worked="git blame first")dhee remember "User prefers Python"
dhee recall "programming language"
dhee checkpoint "Fixed auth bug" --what-worked "checked logs"docker compose up -d # uses OPENAI_API_KEY from envEvery interface — MCP, Python, CLI, JS — exposes the same 4 operations.
Store a fact, preference, or observation.
Hot path: 0 LLM calls, 1 embedding (~$0.0002 typical). The memory is stored immediately. Echo enrichment (paraphrases, keywords, question-forms that make future recall dramatically better) is deferred to checkpoint.
d.remember("User prefers FastAPI over Flask")
d.remember("Project uses PostgreSQL 15 with pgvector")Search memory. Returns top-K results ranked by relevance.
Hot path: 0 LLM calls, 1 embedding (~$0.0002 typical). Pure vector search with echo-boosted re-ranking.
results = d.recall("what database does the project use?")
# [{"memory": "Project uses PostgreSQL 15 with pgvector", "score": 0.94}]HyperAgent session bootstrap. Call once at the start of a conversation.
Returns everything the agent needs to be effective immediately:
- Last session state — pick up where you left off, zero cold start
- Performance trends — improving or regressing on this task type
- Synthesized insights — "What worked for bug_fix: checking git blame first"
- Triggered intentions — "Remember to run auth tests after modifying login.py"
- Proactive warnings — "Performance on code_review is declining"
- Relevant memories — top matches for the task
ctx = d.context("fixing the auth bug in login.py")
# ctx["warnings"] → ["Performance on 'bug_fix' declining (trend: -0.05)"]
# ctx["insights"] → [{"content": "What worked: git blame → found breaking commit"}]
# ctx["intentions"] → [{"description": "run auth tests after login.py changes"}]Save session state before ending. This is where the cognition happens:
- Session digest — saved for cross-agent handoff (Claude Code crashes? Cursor picks up instantly)
- Batch enrichment — 1 LLM call per ~10 memories stored since last checkpoint. Adds echo paraphrases and keywords that make
recallwork across phrasings - Outcome recording — tracks score per task type, auto-detects regressions and breakthroughs
- Insight synthesis — "what worked" and "what failed" become transferable learnings
- Intention storage — "remember to X when Y" fires when the trigger matches
d.checkpoint(
"Fixed auth bug in login.py",
task_type="bug_fix",
outcome_score=1.0,
what_worked="git blame showed the exact commit that broke auth",
what_failed="grep was too slow on the monorepo",
remember_to="run auth tests after any login.py change",
trigger_keywords=["login", "auth"],
)| Operation | LLM calls | Embed calls | Cost |
|---|---|---|---|
remember |
0 | 1 | ~$0.0002 |
recall |
0 | 1 | ~$0.0002 |
context |
0 | 0-1 | ~$0.0002 |
checkpoint |
1 per ~10 memories | 0 | ~$0.001 |
| Typical session | 1 | ~15 | ~$0.004 |
Costs assume OpenAI
text-embedding-3-smallat current pricing. Actual costs vary by provider, model, and configuration.
Dhee has two layers: the memory store and the cognition engine.
Stores memories in SQLite + a vector index. On the hot path (remember/recall), zero LLM calls — just embedding. At checkpoint, unified enrichment runs in a single batched LLM call:
- Echo encoding — generates paraphrases, keywords, and question-forms so "User prefers dark mode" also matches queries like "what theme?" or "UI preferences"
- Category inference — auto-tags for filtering
- Fact decomposition — splits compound statements into atomic, searchable facts
- Entity + profile extraction — builds a knowledge graph of people, tools, projects
All of this happens in 1 LLM call per ~10 memories. Not 4 calls per memory. One batched call.
Memory decays naturally (Ebbinghaus curve). Frequently accessed memories get promoted from short-term to long-term. Unused ones fade. Storage naturally reduces over time as unused memories decay, unlike systems that keep everything indefinitely.
A parallel intelligence layer that observes the memory pipeline and builds meta-knowledge:
- Performance tracking — records outcomes per task type, computes trends (moving average). Auto-generates regression warnings and breakthrough insights.
- Insight synthesis — stores causal hypotheses ("what worked", "what failed"), not raw data. Insights have confidence scores that update on validation/invalidation.
- Prospective memory — stores future triggers with keyword matching. "Remember to run tests after modifying auth" fires when the next query mentions "auth".
- Intention detection — auto-detects "remember to X when Y" patterns in stored memories.
Zero LLM calls on the hot path. Pure pattern matching + statistics. Persistence via JSONL files (~3 files total).
Inspired by Meta's DGM-Hyperagents — agents that emergently develop persistent memory and performance tracking achieve self-accelerating improvement that transfers across domains. Dhee provides these capabilities as infrastructure.
Beyond the core cognition engine, Dhee includes experimental subsystems that are functional but still evolving:
- Belief store — confidence-tracked facts with Bayesian updates and contradiction detection
- Policy store — outcome-linked condition→action rules extracted from task completions
- Episodic indexing — structured event extraction for temporal and aggregation queries
- Contrastive pairs & heuristic distillation — learning from what worked vs. what failed
These are surfaced through context() and checkpoint() automatically when enabled.
Agent (Claude, GPT, Cursor, custom)
│
├── remember(content) → Engram: embed + store (0 LLM)
├── recall(query) → Engram: embed + vector search (0 LLM)
├── context(task) → Buddhi: performance + insights + intentions + memories
└── checkpoint(summary) → Engram: batch enrich (1 LLM/10 mems)
→ Buddhi: outcome + reflect + intention
~/.dhee/
├── history.db # SQLite: memories, history, entities
├── zvec/ # Vector index (embeddings)
└── buddhi/
├── insights.jsonl # Synthesized learnings
├── intentions.jsonl # Future triggers
└── performance.json # Task type scores + trends
Power users who need granular control over skills, trajectories, structural search, and enrichment:
dhee-mcp-full # exposes all 24 toolsfrom dhee import FullMemory
m = FullMemory()
m.add("conversation content", user_id="u1", infer=True)
m.search("query", user_id="u1", limit=10)
m.think("complex question requiring reasoning across memories")pip install dhee[openai,mcp] # OpenAI (recommended, cheapest embeddings)
pip install dhee[gemini,mcp] # Google Gemini
pip install dhee[ollama,mcp] # Ollama (local inference, no API costs)git clone https://github.com/Sankhya-AI/Dhee.git
cd Dhee
./scripts/bootstrap_dev_env.sh
source .venv-dhee/bin/activate
# optional if you prefer manual bootstrap:
# python3 -m venv .venv-dhee
# .venv-dhee/bin/python -m pip install -e ./dhee-accel -e ./engram-bus -e ".[dev]"
pytest
# live vendor-backed suites are explicit opt-in:
# DHEE_RUN_LIVE_TESTS=1 pytest -q tests/test_e2e_all_features.py tests/test_power_packages.py
# manual smoke scripts live under scripts/manual/
4 operations. Deferred enrichment. Your agent remembers, learns, and predicts.
GitHub ·
PyPI ·
Issues
MIT License — Sankhya AI
