Skip to content

atemerev/lethe

Repository files navigation

Lethe

Release License Python

Brain-centric cognitive architecture for a long-running AI assistant.

Lethe is built around a simple premise: a useful personal assistant should not be a single chat loop with tools bolted on. It should have memory, attention, background thought, supervision, and delegation as separate runtime systems with clear responsibilities.

Lethe runs 24/7, communicates through Telegram or an HTTP/SSE API, remembers your preferences and projects across sessions, thinks in the background, and delegates focused work to subagents. The system is brain-inspired, but pragmatic: each cognitive module is just software with an explicit interface, tests, and logs.

Why This Architecture

Most assistants are reactive. They wait for a message, stuff recent chat into a prompt, call tools, and forget the shape of the work once the turn ends.

Lethe is designed as a persistent cognitive system:

  • Executive control: the cortex handles user-facing decisions and delegates long work.
  • Associative memory: the hippocampus recalls notes, prior conversations, and archival memories when they are relevant.
  • Background cognition: the DMN runs between user turns, reflecting on goals and surfacing useful signals.
  • Autonomic supervision: the brainstem monitors health, resources, releases, and runtime state.
  • Attention gating: background notifications pass through scoring, gating, and review before reaching the user.
  • Focused delegation: subagents work on bounded tasks with their own tools, state, progress updates, and terminal results.

The result is an assistant that can keep continuity over days, do long-running work without blocking the main conversation, and avoid turning every internal thought into a user notification.

Architecture

                 Telegram / HTTP API
                        |
                        v
              Cortex: conscious executive
        user turns, tool use, delegation, replies
                        |
       +----------------+----------------+
       |                |                |
       v                v                v
 Hippocampus       Actor System     Notification Pipeline
 associative       subagents,       scoring, gating,
 recall +          registry,        LLM review,
 salience bias     event bus        transport send
       |                |
       v                +----------------+
 Memory Stack                            |
 blocks, notes,                         v
 archival memory,             Brainstem + DMN
 message history              supervision + background thought
                        |
                        v
                 Tool Policy + Tools
            CLI, files, browser, web, Telegram

This is not a metaphor pasted on top of a monolith. The modules map to code boundaries:

Cognitive System Code Responsibility
Cortex agent/, principal actor User-facing executive control, local tools, delegation, final replies.
Hippocampus memory/hippocampus.py Associative recall over notes, archive, and conversation history.
Salience tracker memory/salience.py Emotional salience tagging and recall bias from active patterns.
Brainstem actor/brainstem.py Startup/runtime supervision, resource checks, update checks, health signals.
DMN actor/dmn.py Background cognition, goal scanning, reflection, and useful signal generation.
Notification gate notification_*.py Turns background signals into reviewed, user-safe notifications.
Actor registry actor/__init__.py Actor lifecycle, event bus, spawn hooks, message routing.
Tool policy tools/policy.py Centralized tool surfaces for cortex, subagents, memory, and Telegram.

Cognitive Loop

  1. A user message enters through Telegram or the API.
  2. Hippocampus decides whether recall is useful and injects relevant memory.
  3. Cortex acts directly for quick work or delegates bounded tasks to subagents.
  4. Subagents work independently, report progress, and return terminal results to cortex.
  5. DMN and brainstem run in the background and emit notification candidates.
  6. Notification scoring/gating/review decides what, if anything, should reach the user.
  7. Curator and salience systems update long-term memory and recall bias over time.

Transport Runtime

Lethe has two transports with the same core runtime:

  • Telegram mode polls Telegram and sends direct bot messages.
  • API mode exposes /chat, /events, /model, /cancel, and /file; chat and proactive output are streamed over SSE.

Shared runtime helpers in src/lethe/runtime.py own heartbeat routing, active reminder formatting, and proactive message rate limiting. API route handlers receive their dependencies through an ApiRuntime container instead of module-level service globals.

Actors

Component Role
Cortex Principal actor. The only actor that speaks to the user directly.
Subagents Spawned on demand for focused work. They report progress every two minutes and return a structured terminal result.
Actor registry Owns actors, lifecycle events, and spawn hooks. Ordinary child actors are auto-started through explicit registry hooks.
Tool policy Centralized tool-name sets define cortex tools, subagent defaults, private/free tools, recall skip lists, and Telegram exclusions.

Actors use public lifecycle and mailbox APIs (messages, drain_inbox(), set_task_handle(), recent_messages()) rather than reaching into each other's private state.

Memory and recall

Component Role
Memory blocks Always-in-context markdown state: identity, human, project.
Notes Tagged durable procedures, conventions, and facts.
Archival memory Long-term semantic storage with hybrid vector/full-text search.
Message history Full local conversation history in LanceDB.
Hippocampus LLM-guided associative recall over notes, archive, and conversation history.
Salience tracker Emotional salience tagging, rolling tag compaction, active-pattern tracking, and emotional recall bias.

Hippocampus no longer owns salience tagging directly; it calls into SalienceTracker and uses active salience patterns only as a recall-bias signal.

Notification pipeline

Background actors cannot talk to the user directly. They emit typed notification candidates that pass through a deterministic and LLM-reviewed path:

actor user_notify event
    -> NotificationRouter
    -> NotificationScoring
    -> NotificationGate
    -> NotificationReviewer
    -> transport send callback

The notification modules are named for their runtime job (notification_router.py, notification_gate.py, notification_scoring.py, notification_reviewer.py) and own the implementation directly.

Install

curl -fsSL https://lethe.gg/install | bash

The installer sets up ~/.lethe as the runtime root, builds an isolated container (podman on Linux, apple/container on macOS), and walks you through provider selection and Telegram bot setup. If an existing workspace is detected, you can reuse it without reconfiguring.

Prerequisites: A Telegram bot token and an LLM provider (Anthropic subscription, OpenRouter API key, OpenAI, or a local server). The installer handles all other dependencies.

For native (non-containerized) install: curl -fsSL https://lethe.gg/install | bash -s -- --yolo

Manual install

git clone https://github.com/atemerev/lethe.git
cd lethe
uv sync
cp .env.example .env   # edit with your credentials
uv run lethe

Update / Uninstall

curl -fsSL https://lethe.gg/update | bash
curl -fsSL https://lethe.gg/uninstall | bash

Prompt architecture

System prompt content is split by update lifecycle:

Content Location Updates
Persona (identity, character) workspace/memory/identity.md User-editable. Never overwritten.
System instructions config/prompts/agent_instructions.md Current after git pull.
Tools documentation config/prompts/agent_tools.md Current after git pull.
Actor rules config/prompts/actor_*.md Current after git pull.
Notification review config/prompts/notification_review.md Current after git pull.

Security

Lethe runs in an isolated container by default:

  • Linux: Podman rootless container with volume-mounted access only to ~/.lethe and directories you choose during install.
  • macOS: apple/container (macOS 26+), or Podman as fallback for Intel Macs / older macOS.

Native mode (--yolo) runs without isolation — use at your own risk.

The API server binds to 127.0.0.1 by default. Use a reverse proxy for remote access.

LLM Providers

Provider Auth Example model
Anthropic (subscription) ANTHROPIC_AUTH_TOKEN claude-opus-4-6
Anthropic (API key) ANTHROPIC_API_KEY claude-opus-4-6
OpenRouter OPENROUTER_API_KEY openrouter/moonshotai/kimi-k2.6
OpenAI (API key) OPENAI_API_KEY gpt-5.4
OpenAI (subscription) OPENAI_AUTH_TOKEN gpt-5.4
Local (llama.cpp) LLM_API_BASE + OPENAI_API_KEY=local openai/gemma-4-31B-it-Q8_0.gguf

Set LLM_MODEL explicitly. The installer writes a default for the chosen provider; manual installs must set it in .env.

Multi-model support:

  • LLM_MODEL_AUX -- summarization, hippocampus, lightweight background work
  • LLM_MODEL_DMN -- DMN model override (defaults to main model)

Memory

Notes (persistent knowledge)

Tagged markdown files in ~/.lethe/workspace/notes/:

notes/
  unige_email_via_graph_api.md   # tags: [skill, email, graph-api]
  use_uv_not_pip.md              # tags: [convention, python]
  phd_defense_requirements.md    # tags: [education, PhD]

Skills, conventions, and durable procedures. Searched by hippocampus on each message. Auto-extracted from archival memory by the curator.

Memory blocks (core memory)

Always in context. Stored in workspace/memory/:

  • identity.md -- agent persona (user-customizable, never overwritten)
  • human.md -- what the agent knows about you
  • project.md -- current project context (agent-maintained)

Archival memory

Long-term semantic storage with hybrid search (vector + full-text). The curator runs on startup to extract valuable entries into notes.

Salience tags

Emotional salience tags are maintained separately from recall in workspace/emotional_tags.md. They are compacted to a rolling window and used to bias future memory search when recent high-arousal patterns are active.

Message history

Full conversation history, stored locally in LanceDB. Searchable via conversation_search tool.

Running locally with Gemma 4

Lethe runs well with Gemma 4 31B on consumer GPUs via llama.cpp.

# Build llama.cpp with CUDA
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build -DGGML_CUDA=ON -DGGML_CUDA_FA_ALL_QUANTS=ON
cmake --build build --target llama-server -j$(nproc)

# Start the server (4x RTX 4090 example)
./build/bin/llama-server \
    --model /path/to/gemma-4-31B-it-Q8_0.gguf \
    --host 0.0.0.0 --port 8090 \
    --n-gpu-layers 999 --split-mode tensor \
    --ctx-size 98304 --flash-attn on \
    --parallel 2 --cache-ram 32768 \
    --jinja --reasoning-budget 4096 \
    --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64 \
    -fit off

Configure Lethe:

# .env
LLM_PROVIDER=openai
LLM_MODEL=openai/gemma-4-31B-it-Q8_0.gguf
LLM_API_BASE=http://localhost:8090/v1
LLM_CONTEXT_LIMIT=96000
OPENAI_API_KEY=local

Key flags: --split-mode tensor for true tensor parallelism across GPUs, --jinja for native tool calling, --reasoning-budget 4096 for thinking mode.

Configuration

Environment variables

Variable Description Default
TELEGRAM_BOT_TOKEN Bot token from BotFather required
TELEGRAM_ALLOWED_USER_IDS Comma-separated user IDs all
TELEGRAM_TRANSCRIPTION_ENABLED Transcribe Telegram voice/audio messages true
TRANSCRIPTION_PROVIDER auto, openrouter, openai, or local auto
TRANSCRIPTION_MODEL Whisper/STT model override provider default
TRANSCRIPTION_LANGUAGE Optional language hint, e.g. en auto-detect
TRANSCRIPTION_LOCAL_COMMAND Local Whisper CLI command whisper
LLM_PROVIDER Force provider auto-detect
LLM_MODEL Main model required
LLM_MODEL_AUX Aux model same as main
LLM_MODEL_DMN DMN model override same as main
LLM_API_BASE Custom API URL --
LLM_CONTEXT_LIMIT Context window size 100000
EXA_API_KEY Exa web search optional
ACTORS_ENABLED Enable actor model true
HIPPOCAMPUS_ENABLED Enable associative recall true
CURATOR_ENABLED Enable startup memory curator true
HEARTBEAT_INTERVAL Heartbeat interval (seconds) 3600
HEARTBEAT_ENABLED Enable heartbeat loop true
PROACTIVE_MAX_PER_DAY Proactive message limit 4
PROACTIVE_COOLDOWN_MINUTES Min spacing between proactive msgs 60
LETHE_HOME Runtime root directory ~/.lethe
LETHE_MODE Runtime mode: api or Telegram polling Telegram polling
LETHE_API_TOKEN Bearer token required in API mode required for API
LETHE_API_HOST API server bind address 127.0.0.1
LETHE_CONSOLE Enable local runtime console false
LETHE_CONSOLE_HOST Console bind address 127.0.0.1
LETHE_CONSOLE_PORT Console port 8777

Persona

Edit workspace/memory/identity.md to customize personality, purpose, and background. This file is never overwritten by updates.

System instructions (communication style, output format) are in config/prompts/agent_instructions.md.

Container management

The installer creates the container and service automatically. Useful commands:

# Linux (podman)
systemctl --user start lethe-container
systemctl --user stop lethe-container
journalctl --user -u lethe-container -f
podman exec -it lethe /bin/bash                        # shell into container
podman exec -u 0 -it lethe /bin/bash                   # root shell (install packages)

# macOS (apple/container)
launchctl load ~/Library/LaunchAgents/com.lethe.container.plist
launchctl unload ~/Library/LaunchAgents/com.lethe.container.plist
tail -f ~/.lethe/logs/container.log

Mount additional directories by editing ~/.lethe/config/mounts.conf and re-running scripts/container-setup.sh.

Development

uv run pytest
uv run pytest tests/test_notes.py -v

Project structure

src/lethe/
  actor/       -- actor model (cortex, dmn, brainstem, subagents)
  agent/       -- runtime orchestration
  memory/      -- blocks, LanceDB, notes, hippocampus, salience, curator, LLM client
  context/     -- provider-specific prompt/context assembly
  tools/       -- CLI, files, web, browser, Telegram
  tools/policy.py -- centralized tool policy sets
  telegram/    -- Telegram bot interface
  conversation/ -- conversation management
  notification_*.py -- background user-notification routing, gating, review
  runtime.py   -- shared runtime helpers
  api.py       -- HTTP API routes and API runtime container
  main.py      -- service entry point and transport wiring
  paths.py     -- centralized path derivation from LETHE_HOME

config/
  blocks/      -- seed memory blocks
  prompts/     -- system, actor, heartbeat, and notification prompts
  workspace/   -- workspace seed files (copied once on first run)

License

MIT

About

Autonomous executive assistant with persistent memory and a multi-agent architecture

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors