Brain-centric cognitive architecture for a long-running AI assistant.
Lethe is built around a simple premise: a useful personal assistant should not be a single chat loop with tools bolted on. It should have memory, attention, background thought, supervision, and delegation as separate runtime systems with clear responsibilities.
Lethe runs 24/7, communicates through Telegram or an HTTP/SSE API, remembers your preferences and projects across sessions, thinks in the background, and delegates focused work to subagents. The system is brain-inspired, but pragmatic: each cognitive module is just software with an explicit interface, tests, and logs.
Most assistants are reactive. They wait for a message, stuff recent chat into a prompt, call tools, and forget the shape of the work once the turn ends.
Lethe is designed as a persistent cognitive system:
- Executive control: the cortex handles user-facing decisions and delegates long work.
- Associative memory: the hippocampus recalls notes, prior conversations, and archival memories when they are relevant.
- Background cognition: the DMN runs between user turns, reflecting on goals and surfacing useful signals.
- Autonomic supervision: the brainstem monitors health, resources, releases, and runtime state.
- Attention gating: background notifications pass through scoring, gating, and review before reaching the user.
- Focused delegation: subagents work on bounded tasks with their own tools, state, progress updates, and terminal results.
The result is an assistant that can keep continuity over days, do long-running work without blocking the main conversation, and avoid turning every internal thought into a user notification.
Telegram / HTTP API
|
v
Cortex: conscious executive
user turns, tool use, delegation, replies
|
+----------------+----------------+
| | |
v v v
Hippocampus Actor System Notification Pipeline
associative subagents, scoring, gating,
recall + registry, LLM review,
salience bias event bus transport send
| |
v +----------------+
Memory Stack |
blocks, notes, v
archival memory, Brainstem + DMN
message history supervision + background thought
|
v
Tool Policy + Tools
CLI, files, browser, web, Telegram
This is not a metaphor pasted on top of a monolith. The modules map to code boundaries:
| Cognitive System | Code | Responsibility |
|---|---|---|
| Cortex | agent/, principal actor |
User-facing executive control, local tools, delegation, final replies. |
| Hippocampus | memory/hippocampus.py |
Associative recall over notes, archive, and conversation history. |
| Salience tracker | memory/salience.py |
Emotional salience tagging and recall bias from active patterns. |
| Brainstem | actor/brainstem.py |
Startup/runtime supervision, resource checks, update checks, health signals. |
| DMN | actor/dmn.py |
Background cognition, goal scanning, reflection, and useful signal generation. |
| Notification gate | notification_*.py |
Turns background signals into reviewed, user-safe notifications. |
| Actor registry | actor/__init__.py |
Actor lifecycle, event bus, spawn hooks, message routing. |
| Tool policy | tools/policy.py |
Centralized tool surfaces for cortex, subagents, memory, and Telegram. |
- A user message enters through Telegram or the API.
- Hippocampus decides whether recall is useful and injects relevant memory.
- Cortex acts directly for quick work or delegates bounded tasks to subagents.
- Subagents work independently, report progress, and return terminal results to cortex.
- DMN and brainstem run in the background and emit notification candidates.
- Notification scoring/gating/review decides what, if anything, should reach the user.
- Curator and salience systems update long-term memory and recall bias over time.
Lethe has two transports with the same core runtime:
- Telegram mode polls Telegram and sends direct bot messages.
- API mode exposes
/chat,/events,/model,/cancel, and/file; chat and proactive output are streamed over SSE.
Shared runtime helpers in src/lethe/runtime.py own heartbeat routing, active reminder formatting, and proactive message rate limiting. API route handlers receive their dependencies through an ApiRuntime container instead of module-level service globals.
| Component | Role |
|---|---|
| Cortex | Principal actor. The only actor that speaks to the user directly. |
| Subagents | Spawned on demand for focused work. They report progress every two minutes and return a structured terminal result. |
| Actor registry | Owns actors, lifecycle events, and spawn hooks. Ordinary child actors are auto-started through explicit registry hooks. |
| Tool policy | Centralized tool-name sets define cortex tools, subagent defaults, private/free tools, recall skip lists, and Telegram exclusions. |
Actors use public lifecycle and mailbox APIs (messages, drain_inbox(), set_task_handle(), recent_messages()) rather than reaching into each other's private state.
| Component | Role |
|---|---|
| Memory blocks | Always-in-context markdown state: identity, human, project. |
| Notes | Tagged durable procedures, conventions, and facts. |
| Archival memory | Long-term semantic storage with hybrid vector/full-text search. |
| Message history | Full local conversation history in LanceDB. |
| Hippocampus | LLM-guided associative recall over notes, archive, and conversation history. |
| Salience tracker | Emotional salience tagging, rolling tag compaction, active-pattern tracking, and emotional recall bias. |
Hippocampus no longer owns salience tagging directly; it calls into SalienceTracker and uses active salience patterns only as a recall-bias signal.
Background actors cannot talk to the user directly. They emit typed notification candidates that pass through a deterministic and LLM-reviewed path:
actor user_notify event
-> NotificationRouter
-> NotificationScoring
-> NotificationGate
-> NotificationReviewer
-> transport send callback
The notification modules are named for their runtime job (notification_router.py, notification_gate.py, notification_scoring.py, notification_reviewer.py) and own the implementation directly.
curl -fsSL https://lethe.gg/install | bashThe installer sets up ~/.lethe as the runtime root, builds an isolated container (podman on Linux, apple/container on macOS), and walks you through provider selection and Telegram bot setup. If an existing workspace is detected, you can reuse it without reconfiguring.
Prerequisites: A Telegram bot token and an LLM provider (Anthropic subscription, OpenRouter API key, OpenAI, or a local server). The installer handles all other dependencies.
For native (non-containerized) install: curl -fsSL https://lethe.gg/install | bash -s -- --yolo
git clone https://github.com/atemerev/lethe.git
cd lethe
uv sync
cp .env.example .env # edit with your credentials
uv run lethecurl -fsSL https://lethe.gg/update | bash
curl -fsSL https://lethe.gg/uninstall | bashSystem prompt content is split by update lifecycle:
| Content | Location | Updates |
|---|---|---|
| Persona (identity, character) | workspace/memory/identity.md |
User-editable. Never overwritten. |
| System instructions | config/prompts/agent_instructions.md |
Current after git pull. |
| Tools documentation | config/prompts/agent_tools.md |
Current after git pull. |
| Actor rules | config/prompts/actor_*.md |
Current after git pull. |
| Notification review | config/prompts/notification_review.md |
Current after git pull. |
Lethe runs in an isolated container by default:
- Linux: Podman rootless container with volume-mounted access only to
~/.letheand directories you choose during install. - macOS: apple/container (macOS 26+), or Podman as fallback for Intel Macs / older macOS.
Native mode (--yolo) runs without isolation — use at your own risk.
The API server binds to 127.0.0.1 by default. Use a reverse proxy for remote access.
| Provider | Auth | Example model |
|---|---|---|
| Anthropic (subscription) | ANTHROPIC_AUTH_TOKEN |
claude-opus-4-6 |
| Anthropic (API key) | ANTHROPIC_API_KEY |
claude-opus-4-6 |
| OpenRouter | OPENROUTER_API_KEY |
openrouter/moonshotai/kimi-k2.6 |
| OpenAI (API key) | OPENAI_API_KEY |
gpt-5.4 |
| OpenAI (subscription) | OPENAI_AUTH_TOKEN |
gpt-5.4 |
| Local (llama.cpp) | LLM_API_BASE + OPENAI_API_KEY=local |
openai/gemma-4-31B-it-Q8_0.gguf |
Set LLM_MODEL explicitly. The installer writes a default for the chosen provider; manual installs must set it in .env.
Multi-model support:
LLM_MODEL_AUX-- summarization, hippocampus, lightweight background workLLM_MODEL_DMN-- DMN model override (defaults to main model)
Tagged markdown files in ~/.lethe/workspace/notes/:
notes/
unige_email_via_graph_api.md # tags: [skill, email, graph-api]
use_uv_not_pip.md # tags: [convention, python]
phd_defense_requirements.md # tags: [education, PhD]
Skills, conventions, and durable procedures. Searched by hippocampus on each message. Auto-extracted from archival memory by the curator.
Always in context. Stored in workspace/memory/:
identity.md-- agent persona (user-customizable, never overwritten)human.md-- what the agent knows about youproject.md-- current project context (agent-maintained)
Long-term semantic storage with hybrid search (vector + full-text). The curator runs on startup to extract valuable entries into notes.
Emotional salience tags are maintained separately from recall in workspace/emotional_tags.md. They are compacted to a rolling window and used to bias future memory search when recent high-arousal patterns are active.
Full conversation history, stored locally in LanceDB. Searchable via conversation_search tool.
Lethe runs well with Gemma 4 31B on consumer GPUs via llama.cpp.
# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build build --target llama-server -j$(nproc)
# Start the server (4x RTX 4090 example)
./build/bin/llama-server \
--model /path/to/gemma-4-31B-it-Q8_0.gguf \
--host 0.0.0.0 --port 8090 \
--n-gpu-layers 999 --split-mode tensor \
--ctx-size 98304 --flash-attn on \
--parallel 2 --cache-ram 32768 \
--jinja --reasoning-budget 4096 \
--spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 \
-fit offConfigure Lethe:
# .env
LLM_PROVIDER=openai
LLM_MODEL=openai/gemma-4-31B-it-Q8_0.gguf
LLM_API_BASE=http://localhost:8090/v1
LLM_CONTEXT_LIMIT=96000
OPENAI_API_KEY=localKey flags: --split-mode tensor for true tensor parallelism across GPUs, --jinja for native tool calling, --reasoning-budget 4096 for thinking mode.
| Variable | Description | Default |
|---|---|---|
TELEGRAM_BOT_TOKEN |
Bot token from BotFather | required |
TELEGRAM_ALLOWED_USER_IDS |
Comma-separated user IDs | all |
TELEGRAM_TRANSCRIPTION_ENABLED |
Transcribe Telegram voice/audio messages | true |
TRANSCRIPTION_PROVIDER |
auto, openrouter, openai, or local |
auto |
TRANSCRIPTION_MODEL |
Whisper/STT model override | provider default |
TRANSCRIPTION_LANGUAGE |
Optional language hint, e.g. en |
auto-detect |
TRANSCRIPTION_LOCAL_COMMAND |
Local Whisper CLI command | whisper |
LLM_PROVIDER |
Force provider | auto-detect |
LLM_MODEL |
Main model | required |
LLM_MODEL_AUX |
Aux model | same as main |
LLM_MODEL_DMN |
DMN model override | same as main |
LLM_API_BASE |
Custom API URL | -- |
LLM_CONTEXT_LIMIT |
Context window size | 100000 |
EXA_API_KEY |
Exa web search | optional |
ACTORS_ENABLED |
Enable actor model | true |
HIPPOCAMPUS_ENABLED |
Enable associative recall | true |
CURATOR_ENABLED |
Enable startup memory curator | true |
HEARTBEAT_INTERVAL |
Heartbeat interval (seconds) | 3600 |
HEARTBEAT_ENABLED |
Enable heartbeat loop | true |
PROACTIVE_MAX_PER_DAY |
Proactive message limit | 4 |
PROACTIVE_COOLDOWN_MINUTES |
Min spacing between proactive msgs | 60 |
LETHE_HOME |
Runtime root directory | ~/.lethe |
LETHE_MODE |
Runtime mode: api or Telegram polling |
Telegram polling |
LETHE_API_TOKEN |
Bearer token required in API mode | required for API |
LETHE_API_HOST |
API server bind address | 127.0.0.1 |
LETHE_CONSOLE |
Enable local runtime console | false |
LETHE_CONSOLE_HOST |
Console bind address | 127.0.0.1 |
LETHE_CONSOLE_PORT |
Console port | 8777 |
Edit workspace/memory/identity.md to customize personality, purpose, and background. This file is never overwritten by updates.
System instructions (communication style, output format) are in config/prompts/agent_instructions.md.
The installer creates the container and service automatically. Useful commands:
# Linux (podman)
systemctl --user start lethe-container
systemctl --user stop lethe-container
journalctl --user -u lethe-container -f
podman exec -it lethe /bin/bash # shell into container
podman exec -u 0 -it lethe /bin/bash # root shell (install packages)
# macOS (apple/container)
launchctl load ~/Library/LaunchAgents/com.lethe.container.plist
launchctl unload ~/Library/LaunchAgents/com.lethe.container.plist
tail -f ~/.lethe/logs/container.logMount additional directories by editing ~/.lethe/config/mounts.conf and re-running scripts/container-setup.sh.
uv run pytest
uv run pytest tests/test_notes.py -vsrc/lethe/
actor/ -- actor model (cortex, dmn, brainstem, subagents)
agent/ -- runtime orchestration
memory/ -- blocks, LanceDB, notes, hippocampus, salience, curator, LLM client
context/ -- provider-specific prompt/context assembly
tools/ -- CLI, files, web, browser, Telegram
tools/policy.py -- centralized tool policy sets
telegram/ -- Telegram bot interface
conversation/ -- conversation management
notification_*.py -- background user-notification routing, gating, review
runtime.py -- shared runtime helpers
api.py -- HTTP API routes and API runtime container
main.py -- service entry point and transport wiring
paths.py -- centralized path derivation from LETHE_HOME
config/
blocks/ -- seed memory blocks
prompts/ -- system, actor, heartbeat, and notification prompts
workspace/ -- workspace seed files (copied once on first run)
MIT