Lethe

Brain-centric cognitive architecture for a long-running AI assistant.

Lethe is built around a simple premise: a useful personal assistant should not be a single chat loop with tools bolted on. It should have memory, attention, background thought, supervision, and delegation as separate runtime systems with clear responsibilities.

Lethe runs 24/7, communicates through Telegram or an HTTP/SSE API, remembers your preferences and projects across sessions, thinks in the background, and delegates focused work to subagents. The system is brain-inspired, but pragmatic: each cognitive module is just software with an explicit interface, tests, and logs.

Why This Architecture

Most assistants are reactive. They wait for a message, stuff recent chat into a prompt, call tools, and forget the shape of the work once the turn ends.

Lethe is designed as a persistent cognitive system:

Executive control: the cortex handles user-facing decisions and delegates long work.
Associative memory: the hippocampus recalls notes, prior conversations, and archival memories when they are relevant.
Background cognition: the DMN runs between user turns, reflecting on goals and surfacing useful signals.
Autonomic supervision: the brainstem monitors health, resources, releases, and runtime state.
Attention gating: background notifications pass through scoring, gating, and review before reaching the user.
Focused delegation: subagents work on bounded tasks with their own tools, state, progress updates, and terminal results.

The result is an assistant that can keep continuity over days, do long-running work without blocking the main conversation, and avoid turning every internal thought into a user notification.

Architecture

                 Telegram / HTTP API
                        |
                        v
              Cortex: conscious executive
        user turns, tool use, delegation, replies
                        |
       +----------------+----------------+
       |                |                |
       v                v                v
 Hippocampus       Actor System     Notification Pipeline
 associative       subagents,       scoring, gating,
 recall +          registry,        LLM review,
 salience bias     event bus        transport send
       |                |
       v                +----------------+
 Memory Stack                            |
 blocks, notes,                         v
 archival memory,             Brainstem + DMN
 message history              supervision + background thought
                        |
                        v
                 Tool Policy + Tools
            CLI, files, browser, web, Telegram

This is not a metaphor pasted on top of a monolith. The modules map to code boundaries:

Cognitive System	Code	Responsibility
Cortex	`agent/`, principal actor	User-facing executive control, local tools, delegation, final replies.
Hippocampus	`memory/hippocampus.py`	Associative recall over notes, archive, and conversation history.
Salience tracker	`memory/salience.py`	Emotional salience tagging and recall bias from active patterns.
Brainstem	`actor/brainstem.py`	Startup/runtime supervision, resource checks, update checks, health signals.
DMN	`actor/dmn.py`	Background cognition, goal scanning, reflection, and useful signal generation.
Notification gate	`notification_*.py`	Turns background signals into reviewed, user-safe notifications.
Actor registry	`actor/__init__.py`	Actor lifecycle, event bus, spawn hooks, message routing.
Tool policy	`tools/policy.py`	Centralized tool surfaces for cortex, subagents, memory, and Telegram.

Cognitive Loop

A user message enters through Telegram or the API.
Hippocampus decides whether recall is useful and injects relevant memory.
Cortex acts directly for quick work or delegates bounded tasks to subagents.
Subagents work independently, report progress, and return terminal results to cortex.
DMN and brainstem run in the background and emit notification candidates.
Notification scoring/gating/review decides what, if anything, should reach the user.
Curator and salience systems update long-term memory and recall bias over time.

Transport Runtime

Lethe has two transports with the same core runtime:

Telegram mode polls Telegram and sends direct bot messages.
API mode exposes /chat, /events, /model, /cancel, and /file; chat and proactive output are streamed over SSE.

Shared runtime helpers in src/lethe/runtime.py own heartbeat routing, active reminder formatting, and proactive message rate limiting. API route handlers receive their dependencies through an ApiRuntime container instead of module-level service globals.

Actors

Component	Role
Cortex	Principal actor. The only actor that speaks to the user directly.
Subagents	Spawned on demand for focused work. They report progress every two minutes and return a structured terminal result.
Actor registry	Owns actors, lifecycle events, and spawn hooks. Ordinary child actors are auto-started through explicit registry hooks.
Tool policy	Centralized tool-name sets define cortex tools, subagent defaults, private/free tools, recall skip lists, and Telegram exclusions.

Actors use public lifecycle and mailbox APIs (messages, drain_inbox(), set_task_handle(), recent_messages()) rather than reaching into each other's private state.

Memory and recall

Component	Role
Memory blocks	Always-in-context markdown state: identity, human, project.
Notes	Tagged durable procedures, conventions, and facts.
Archival memory	Long-term semantic storage with hybrid vector/full-text search.
Message history	Full local conversation history in LanceDB.
Hippocampus	LLM-guided associative recall over notes, archive, and conversation history.
Salience tracker	Emotional salience tagging, rolling tag compaction, active-pattern tracking, and emotional recall bias.

Hippocampus no longer owns salience tagging directly; it calls into SalienceTracker and uses active salience patterns only as a recall-bias signal.

Notification pipeline

Background actors cannot talk to the user directly. They emit typed notification candidates that pass through a deterministic and LLM-reviewed path:

actor user_notify event
    -> NotificationRouter
    -> NotificationScoring
    -> NotificationGate
    -> NotificationReviewer
    -> transport send callback

The notification modules are named for their runtime job (notification_router.py, notification_gate.py, notification_scoring.py, notification_reviewer.py) and own the implementation directly.

Install

curl -fsSL https://lethe.gg/install | bash

The installer sets up ~/.lethe as the runtime root, builds an isolated container (podman on Linux, apple/container on macOS), and walks you through provider selection and Telegram bot setup. If an existing workspace is detected, you can reuse it without reconfiguring.

Prerequisites: A Telegram bot token and an LLM provider (Anthropic subscription, OpenRouter API key, OpenAI, or a local server). The installer handles all other dependencies.

For native (non-containerized) install: curl -fsSL https://lethe.gg/install | bash -s -- --yolo

Manual install

git clone https://github.com/atemerev/lethe.git
cd lethe
uv sync
cp .env.example .env   # edit with your credentials
uv run lethe

Update / Uninstall

curl -fsSL https://lethe.gg/update | bash
curl -fsSL https://lethe.gg/uninstall | bash

Prompt architecture

System prompt content is split by update lifecycle:

Content	Location	Updates
Persona (identity, character)	`workspace/memory/identity.md`	User-editable. Never overwritten.
System instructions	`config/prompts/agent_instructions.md`	Current after `git pull`.
Tools documentation	`config/prompts/agent_tools.md`	Current after `git pull`.
Actor rules	`config/prompts/actor_*.md`	Current after `git pull`.
Notification review	`config/prompts/notification_review.md`	Current after `git pull`.

Security

Lethe runs in an isolated container by default:

Linux: Podman rootless container with volume-mounted access only to ~/.lethe and directories you choose during install.
macOS: apple/container (macOS 26+), or Podman as fallback for Intel Macs / older macOS.

Native mode (--yolo) runs without isolation — use at your own risk.

The API server binds to 127.0.0.1 by default. Use a reverse proxy for remote access.

LLM Providers

Provider	Auth	Example model
Anthropic (subscription)	`ANTHROPIC_AUTH_TOKEN`	`claude-opus-4-6`
Anthropic (API key)	`ANTHROPIC_API_KEY`	`claude-opus-4-6`
OpenRouter	`OPENROUTER_API_KEY`	`openrouter/moonshotai/kimi-k2.6`
OpenAI (API key)	`OPENAI_API_KEY`	`gpt-5.4`
OpenAI (subscription)	`OPENAI_AUTH_TOKEN`	`gpt-5.4`
Local (llama.cpp)	`LLM_API_BASE` + `OPENAI_API_KEY=local`	`openai/gemma-4-31B-it-Q8_0.gguf`

Set LLM_MODEL explicitly. The installer writes a default for the chosen provider; manual installs must set it in .env.

Multi-model support:

LLM_MODEL_AUX -- summarization, hippocampus, lightweight background work
LLM_MODEL_DMN -- DMN model override (defaults to main model)

Memory

Notes (persistent knowledge)

Tagged markdown files in ~/.lethe/workspace/notes/:

notes/
  unige_email_via_graph_api.md   # tags: [skill, email, graph-api]
  use_uv_not_pip.md              # tags: [convention, python]
  phd_defense_requirements.md    # tags: [education, PhD]

Skills, conventions, and durable procedures. Searched by hippocampus on each message. Auto-extracted from archival memory by the curator.

Memory blocks (core memory)

Always in context. Stored in workspace/memory/:

identity.md -- agent persona (user-customizable, never overwritten)
human.md -- what the agent knows about you
project.md -- current project context (agent-maintained)

Archival memory

Long-term semantic storage with hybrid search (vector + full-text). The curator runs on startup to extract valuable entries into notes.

Salience tags

Emotional salience tags are maintained separately from recall in workspace/emotional_tags.md. They are compacted to a rolling window and used to bias future memory search when recent high-arousal patterns are active.

Message history

Full conversation history, stored locally in LanceDB. Searchable via conversation_search tool.

Running locally with Gemma 4

Lethe runs well with Gemma 4 31B on consumer GPUs via llama.cpp.

# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build build --target llama-server -j$(nproc)

# Start the server (4x RTX 4090 example)
./build/bin/llama-server \
    --model /path/to/gemma-4-31B-it-Q8_0.gguf \
    --host 0.0.0.0 --port 8090 \
    --n-gpu-layers 999 --split-mode tensor \
    --ctx-size 98304 --flash-attn on \
    --parallel 2 --cache-ram 32768 \
    --jinja --reasoning-budget 4096 \
    --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 \
    -fit off

Configure Lethe:

# .env
LLM_PROVIDER=openai
LLM_MODEL=openai/gemma-4-31B-it-Q8_0.gguf
LLM_API_BASE=http://localhost:8090/v1
LLM_CONTEXT_LIMIT=96000
OPENAI_API_KEY=local

Key flags: --split-mode tensor for true tensor parallelism across GPUs, --jinja for native tool calling, --reasoning-budget 4096 for thinking mode.

Configuration

Environment variables

Variable	Description	Default
`TELEGRAM_BOT_TOKEN`	Bot token from BotFather	required
`TELEGRAM_ALLOWED_USER_IDS`	Comma-separated user IDs	all
`TELEGRAM_TRANSCRIPTION_ENABLED`	Transcribe Telegram voice/audio messages	`true`
`TRANSCRIPTION_PROVIDER`	`auto`, `openrouter`, `openai`, or `local`	`auto`
`TRANSCRIPTION_MODEL`	Whisper/STT model override	provider default
`TRANSCRIPTION_LANGUAGE`	Optional language hint, e.g. `en`	auto-detect
`TRANSCRIPTION_LOCAL_COMMAND`	Local Whisper CLI command	`whisper`
`LLM_PROVIDER`	Force provider	auto-detect
`LLM_MODEL`	Main model	required
`LLM_MODEL_AUX`	Aux model	same as main
`LLM_MODEL_DMN`	DMN model override	same as main
`LLM_API_BASE`	Custom API URL	--
`LLM_CONTEXT_LIMIT`	Context window size	`100000`
`EXA_API_KEY`	Exa web search	optional
`ACTORS_ENABLED`	Enable actor model	`true`
`HIPPOCAMPUS_ENABLED`	Enable associative recall	`true`
`CURATOR_ENABLED`	Enable startup memory curator	`true`
`HEARTBEAT_INTERVAL`	Heartbeat interval (seconds)	`3600`
`HEARTBEAT_ENABLED`	Enable heartbeat loop	`true`
`PROACTIVE_MAX_PER_DAY`	Proactive message limit	`4`
`PROACTIVE_COOLDOWN_MINUTES`	Min spacing between proactive msgs	`60`
`LETHE_HOME`	Runtime root directory	`~/.lethe`
`LETHE_MODE`	Runtime mode: `api` or Telegram polling	Telegram polling
`LETHE_API_TOKEN`	Bearer token required in API mode	required for API
`LETHE_API_HOST`	API server bind address	`127.0.0.1`
`LETHE_CONSOLE`	Enable local runtime console	`false`
`LETHE_CONSOLE_HOST`	Console bind address	`127.0.0.1`
`LETHE_CONSOLE_PORT`	Console port	`8777`

Persona

Edit workspace/memory/identity.md to customize personality, purpose, and background. This file is never overwritten by updates.

System instructions (communication style, output format) are in config/prompts/agent_instructions.md.

Container management

The installer creates the container and service automatically. Useful commands:

# Linux (podman)
systemctl --user start lethe-container
systemctl --user stop lethe-container
journalctl --user -u lethe-container -f
podman exec -it lethe /bin/bash                        # shell into container
podman exec -u 0 -it lethe /bin/bash                   # root shell (install packages)

# macOS (apple/container)
launchctl load ~/Library/LaunchAgents/com.lethe.container.plist
launchctl unload ~/Library/LaunchAgents/com.lethe.container.plist
tail -f ~/.lethe/logs/container.log

Mount additional directories by editing ~/.lethe/config/mounts.conf and re-running scripts/container-setup.sh.

Development

uv run pytest
uv run pytest tests/test_notes.py -v

Project structure

src/lethe/
  actor/       -- actor model (cortex, dmn, brainstem, subagents)
  agent/       -- runtime orchestration
  memory/      -- blocks, LanceDB, notes, hippocampus, salience, curator, LLM client
  context/     -- provider-specific prompt/context assembly
  tools/       -- CLI, files, web, browser, Telegram
  tools/policy.py -- centralized tool policy sets
  telegram/    -- Telegram bot interface
  conversation/ -- conversation management
  notification_*.py -- background user-notification routing, gating, review
  runtime.py   -- shared runtime helpers
  api.py       -- HTTP API routes and API runtime container
  main.py      -- service entry point and transport wiring
  paths.py     -- centralized path derivation from LETHE_HOME

config/
  blocks/      -- seed memory blocks
  prompts/     -- system, actor, heartbeat, and notification prompts
  workspace/   -- workspace seed files (copied once on first run)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 708 Commits
config		config
examples		examples
scripts		scripts
src/lethe		src/lethe
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Containerfile		Containerfile
README.md		README.md
VISION.md		VISION.md
install.sh		install.sh
pyproject.toml		pyproject.toml
uninstall.sh		uninstall.sh
update.sh		update.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lethe

Why This Architecture

Architecture

Cognitive Loop

Transport Runtime

Actors

Memory and recall

Notification pipeline

Install

Manual install

Update / Uninstall

Prompt architecture

Security

LLM Providers

Memory

Notes (persistent knowledge)

Memory blocks (core memory)

Archival memory

Salience tags

Message history

Running locally with Gemma 4

Configuration

Environment variables

Persona

Container management

Development

Project structure

License

About

Uh oh!

Releases 63

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lethe

Why This Architecture

Architecture

Cognitive Loop

Transport Runtime

Actors

Memory and recall

Notification pipeline

Install

Manual install

Update / Uninstall

Prompt architecture

Security

LLM Providers

Memory

Notes (persistent knowledge)

Memory blocks (core memory)

Archival memory

Salience tags

Message history

Running locally with Gemma 4

Configuration

Environment variables

Persona

Container management

Development

Project structure

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 63

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages