agent-cli is a collection of local-first, AI-powered command-line agents that run entirely on your machine.
It provides a suite of powerful tools for voice and text interaction, designed for privacy, offline capability, and seamless integration with system-wide hotkeys and workflows.
Tip
Short aliases available: You can use agent or ag instead of agent-cli for convenience.
Important
Local and Private by Design All agents in this tool are designed to run 100% locally. Your data, whether it's from your clipboard, microphone, or files, is never sent to any cloud API. This ensures your privacy and allows the tools to work completely offline. You can also optionally configure the agents to use OpenAI/Gemini services.
I got tired of typing long prompts to LLMs. Speaking is faster, so I built this tool to transcribe my voice directly to the clipboard with a hotkey.
What it does:
- Voice transcription to clipboard with system-wide hotkeys (Cmd+Shift+R on macOS)
- Autocorrect any text from your clipboard
- Edit clipboard content with voice commands ("make this more formal")
- Runs locally - no internet required, your audio stays on your machine
- Works with any app that can copy/paste
I use it mostly for the transcribe command when working with LLMs. Being able to speak naturally means I can provide more context without the typing fatigue.
Since then I have expanded the tool with many more features, all focused on local-first AI agents that integrate seamlessly with your system.
See agent-cli in action: Watch the demo
autocorrect: Correct grammar and spelling in your text using a local LLM.transcribe: Transcribe audio from your microphone to clipboard.speak: Convert text to speech using a local TTS engine.voice-edit: Edit clipboard text with voice commands.assistant: Wake word-based voice assistant.chat: Conversational AI with tool-calling capabilities.memory: Long-term memory system withmemory proxyandmemory add.rag-proxy: RAG proxy server for chatting with your documents.dev: Parallel development with git worktrees and AI coding agents.server: Local ASR and TTS servers with dual-protocol (Wyoming & OpenAI-compatible APIs), TTL-based memory management, and multi-platform acceleration. Whisper uses MLX on Apple Silicon or Faster Whisper on Linux/CUDA. TTS supports Kokoro (GPU) or Piper (CPU).transcribe-live: Continuous background transcription with VAD. Install withuv tool install "agent-cli[vad]" -p 3.13.
If you already have AI services running (or plan to use OpenAI), simply install:
# Using uv (recommended)
uv tool install agent-cli -p 3.13
# Using pip
pip install agent-cliNote
The -p 3.13 flag is required because some dependencies (like onnxruntime) don't support Python 3.14 yet.
See uv issue #8206 for details.
Then use it:
agent-cli autocorrect "this has an eror"We offer two ways to set up agent-cli with all services:
# 1. Clone the repository
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli
# 2. Run setup (installs all services + agent-cli)
./scripts/setup-macos.sh # or setup-linux.sh
# 3. Start services
./scripts/start-all-services.sh
# 4. (Optional) Set up system-wide hotkeys
./scripts/setup-macos-hotkeys.sh # or setup-linux-hotkeys.sh
# 5. Use it!
agent-cli autocorrect "this has an eror"Note
agent-cli uses sounddevice for real-time microphone/voice features.
On Linux only, you need to install the system-level PortAudio library (sudo apt install portaudio19-dev / your distro's equivalent on Linux) before you run uv tool install agent-cli -p 3.13.
On Windows and macOS, this is handled automatically.
# 1. Install agent-cli
uv tool install agent-cli -p 3.13
# 2. Install all required services
agent-cli install-services
# 3. Start all services
agent-cli start-services
# 4. (Optional) Set up system-wide hotkeys
agent-cli install-hotkeys
# 5. Use it!
agent-cli autocorrect "this has an eror"The setup scripts automatically install:
- ✅ Package managers (Homebrew/uv) if needed
- ✅ All AI services (Ollama, Whisper, TTS, etc.)
- ✅ The
agent-clitool - ✅ System dependencies
- ✅ Hotkey managers (if using hotkey scripts)
If you already have AI services set up or plan to use cloud services (OpenAI/Gemini):
# Using uv (recommended)
uv tool install agent-cli -p 3.13
# Using pip
pip install agent-cliFor a complete local setup with all AI services:
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli| Platform | Setup Command | What It Does | Detailed Guide |
|---|---|---|---|
| 🍎 macOS | ./scripts/setup-macos.sh |
Installs Homebrew (if needed), uv, Ollama, all services, and agent-cli | macOS Guide |
| 🐧 Linux | ./scripts/setup-linux.sh |
Installs uv, Ollama, all services, and agent-cli | Linux Guide |
| ❄️ NixOS | See guide → | Special instructions for NixOS | NixOS Guide |
| 🐳 Docker | See guide → | Container-based setup (slower) | Docker Guide |
./scripts/start-all-services.shThis launches all AI services in a single terminal session using Zellij.
agent-cli autocorrect "this has an eror"
# Output: this has an errorNote
The setup scripts handle everything automatically. For platform-specific details or troubleshooting, see the installation guides.
Development Installation
For contributing or development:
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli
uv sync
source .venv/bin/activate # On Windows: .venv\Scripts\activateWant system-wide hotkeys? You'll need the repository for the setup scripts:
# If you haven't already cloned it
git clone https://github.com/basnijholt/agent-cli.git
cd agent-cli./scripts/setup-macos-hotkeys.shThis script automatically:
- ✅ Installs Homebrew if not present
- ✅ Installs skhd (hotkey daemon) and terminal-notifier
- ✅ Configures these system-wide hotkeys:
Cmd+Shift+R- Toggle voice transcriptionCmd+Shift+A- Autocorrect clipboard textCmd+Shift+V- Voice edit clipboard text
Note
After setup, you may need to grant Accessibility permissions to skhd in System Settings → Privacy & Security → Accessibility
Tip
To keep the “Listening…” indicator visible for the whole recording, open System Settings → Notifications → terminal-notifier and set the Alert style to Persistent (or choose Alerts on older macOS versions). Also enable "Allow notification when mirroring or sharing the display". The hotkey scripts keep only the recording notification pinned; status and result toasts auto-dismiss.
./scripts/setup-linux-hotkeys.shThis script automatically:
- ✅ Installs notification tools if needed
- ✅ Provides configuration for your desktop environment
- ✅ Sets up these hotkeys:
Super+Shift+R- Toggle voice transcriptionSuper+Shift+A- Autocorrect clipboard textSuper+Shift+V- Voice edit clipboard text
The script supports Hyprland, GNOME, KDE, Sway, i3, XFCE, and provides instructions for manual configuration on other environments.
The dev command is also available as a Claude Code plugin, enabling Claude to automatically spawn parallel AI agents in isolated git worktrees when you ask it to work on multiple features.
# Option 1: Install skill directly in your project (recommended)
agent-cli dev install-skill
# Option 2: Install via Claude Code plugin marketplace
claude plugin marketplace add basnijholt/agent-cli
claude plugin install agent-cli-dev@agent-cliOnce installed, Claude Code can automatically use this skill when you ask to:
- "Work on these 3 features in parallel"
- "Spawn agents for auth and payments"
- "Delegate this refactoring to a separate agent"
See the plugin documentation for more details.
The only thing you need to have installed is Git to clone this repository. Everything else is handled automatically!
Our installation scripts automatically handle all dependencies:
- 🍺 Homebrew (macOS) - Installed if not present
- 🐍 uv - Python package manager - Installed automatically
- 📋 Clipboard Tools - Pre-installed on macOS, handled on Linux
| Service | Purpose | Auto-installed? |
|---|---|---|
| Ollama | Local LLM for text processing | ✅ Yes, with default model |
| Wyoming Faster Whisper | Speech-to-text | ✅ Yes, via uvx |
agent-cli server whisper |
Speech-to-text (alternative) | ✅ Built-in, pip install "agent-cli[faster-whisper]" |
| Wyoming Piper | Text-to-speech | ✅ Yes, via uvx |
| Kokoro-FastAPI | Premium TTS (optional) | ⚙️ Can be added later |
| Wyoming openWakeWord | Wake word detection | ✅ Yes, for assistant |
Why
agent-cli server whisper? The built-in Whisper server offers an OpenAI-compatible API (drop-in replacement), Wyoming protocol for Home Assistant, TTL-based VRAM management (auto-unloads idle models), and auto-selects the optimal backend (MLX on Apple Silicon, faster-whisper on Linux/CUDA). Docker images available atghcr.io/basnijholt/agent-cli-whisper.
If you prefer cloud services over local ones:
| Service | Purpose | Setup Required |
|---|---|---|
| OpenAI | LLM, Speech-to-text, TTS | API key in config |
| Gemini | LLM alternative | API key in config |
You can also use other OpenAI-compatible local servers:
| Server | Purpose | Setup Required |
|---|---|---|
| llama.cpp | Local LLM inference | Use --openai-base-url http://localhost:8080/v1 |
| vLLM | High-performance LLM serving | Use --openai-base-url with server endpoint |
| Ollama | Default local LLM | Already configured as default |
This package provides multiple command-line tools, each designed for a specific purpose.
These commands help you set up agent-cli and its required services:
install-services: Install all required AI services (Ollama, Whisper, Piper, OpenWakeWord)install-hotkeys: Set up system-wide hotkeys for quick access to agent-cli featuresinstall-extras: Install optional Python dependencies (rag, memory, vad, etc.) with pinned versionsstart-services: Start all services in a Zellij terminal session
All necessary scripts are bundled with the package, so you can run these commands immediately after installing agent-cli.
Some features require additional Python dependencies. By default, agent-cli will auto-install missing extras when you run a command that needs them. To disable this, set AGENT_CLI_NO_AUTO_INSTALL=1 or add to your config file:
[settings]
auto_install_extras = falseYou can also manually install extras with install-extras:
# List available extras
agent-cli install-extras --list
# Install specific extras
agent-cli install-extras rag memory vadSee the output of agent-cli install-extras --help
Usage: agent-cli install-extras [OPTIONS] [EXTRAS]...
Install optional dependencies with pinned, compatible versions.
Many agent-cli features require optional dependencies. This command installs them with
version pinning to ensure compatibility. Dependencies persist across uv tool upgrade
when installed via uv tool.
Available extras:
• audio - Audio recording/playback
• faster-whisper - Whisper ASR via CTranslate2
• kokoro - Kokoro neural TTS (GPU)
• llm - LLM framework (pydantic-ai)
• memory - Long-term memory proxy
• mlx-whisper - Whisper ASR for Apple Silicon
• piper - Piper TTS (CPU)
• rag - RAG proxy (ChromaDB, embeddings)
• server - FastAPI server components
• speed - Audio speed adjustment (audiostretchy)
• vad - Voice Activity Detection (Silero VAD via ONNX)
• whisper-transformers - Whisper ASR via HuggingFace transformers
• wyoming - Wyoming protocol support
Examples:
agent-cli install-extras rag # Install RAG dependencies
agent-cli install-extras memory vad # Install multiple extras
agent-cli install-extras --list # Show available extras
agent-cli install-extras --all # Install all extras
╭─ Arguments ────────────────────────────────────────────────────────────────────────────╮
│ extras [EXTRAS]... Extras to install: audio, faster-whisper, kokoro, llm, │
│ memory, mlx-whisper, piper, rag, server, speed, vad, │
│ whisper-transformers, wyoming │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --list -l Show available extras with descriptions (what each one enables) │
│ --all -a Install all available extras at once │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
All agent-cli commands can be configured using a TOML file. The configuration file is searched for in the following locations, in order:
./agent-cli-config.toml(in the current directory)~/.config/agent-cli/config.toml
You can also specify a path to a configuration file using the --config option, e.g., agent-cli transcribe --config /path/to/your/config.toml.
Command-line options always take precedence over settings in the configuration file.
Use the config command to manage your configuration files:
# Create a new config file with all options (commented out as a template)
agent-cli config init
# View your current config (syntax highlighted)
agent-cli config show
# View config as raw text (for copy-paste)
agent-cli config show --raw
# Open config in your editor ($EDITOR, or nano/vim)
agent-cli config editSee the output of agent-cli config --help
Usage: agent-cli config [OPTIONS] COMMAND [ARGS]...
Manage agent-cli configuration files.
Config files are TOML format and searched in order:
1 ./agent-cli-config.toml (project-local)
2 ~/.config/agent-cli/config.toml (user default)
Settings in [defaults] apply to all commands. Override per-command with sections like
[chat] or [transcribe]. CLI arguments override config file settings.
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ─────────────────────────────────────────────────────────────────────────────╮
│ init Create a new config file with all options as commented-out examples. │
│ edit Open the config file in your default editor. │
│ show Display the active config file path and contents. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
An example configuration file is also provided in example.agent-cli-config.toml.
You can choose local or cloud services per capability by setting provider keys in
the [defaults] section of your configuration file.
[defaults]
# llm_provider = "ollama" # 'ollama', 'openai', or 'gemini'
# asr_provider = "wyoming" # 'wyoming', 'openai', or 'gemini'
# tts_provider = "wyoming" # 'wyoming', 'openai', 'kokoro', or 'gemini'
# openai_api_key = "sk-..."
# gemini_api_key = "..."Purpose: Quickly fix spelling and grammar in any text you've copied.
Workflow: This is a simple, one-shot command.
- It reads text from your system clipboard (or from a direct argument).
- It sends the text to your configured LLM provider (default: Ollama) with a prompt to perform only technical corrections.
- The corrected text is copied back to your clipboard, replacing the original.
How to Use It: This tool is ideal for integrating with a system-wide hotkey.
- From Clipboard:
agent-cli autocorrect - From Argument:
agent-cli autocorrect "this text has an eror"
See the output of agent-cli autocorrect --help
Usage: agent-cli autocorrect [OPTIONS] [TEXT]
Fix grammar, spelling, and punctuation using an LLM.
Reads text from clipboard (or argument), sends to LLM for correction, and copies the
result back to clipboard. Only makes technical corrections without changing meaning or
tone.
Workflow:
1 Read text from clipboard (or TEXT argument)
2 Send to LLM for grammar/spelling/punctuation fixes
3 Copy corrected text to clipboard (unless --json)
4 Display result
Examples:
# Correct text from clipboard (default)
agent-cli autocorrect
# Correct specific text
agent-cli autocorrect "this is incorect"
# Use OpenAI instead of local Ollama
agent-cli autocorrect --llm-provider openai
# Get JSON output for scripting (disables clipboard)
agent-cli autocorrect --json
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ text [TEXT] Text to correct. If omitted, reads from system clipboard. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
│ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
│ [env var: LLM_PROVIDER] │
│ [default: ollama] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
│ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
│ [env var: LLM_OLLAMA_MODEL] │
│ [default: gemma3:4b] │
│ --llm-ollama-host TEXT The Ollama server host. Default is │
│ http://localhost:11434. │
│ [env var: LLM_OLLAMA_HOST] │
│ [default: http://localhost:11434] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
│ [env var: LLM_OPENAI_MODEL] │
│ [default: gpt-5-mini] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
│ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
│ [env var: LLM_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
│ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
│ GEMINI_API_KEY environment variable. │
│ [env var: GEMINI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --log-level [debug|info|warning|error] Set logging level. │
│ [env var: LOG_LEVEL] │
│ [default: warning] │
│ --log-file TEXT Path to a file to write logs to. │
│ --quiet -q Suppress console output from rich. │
│ --json Output result as JSON (implies │
│ --quiet and --no-clipboard). │
│ --config TEXT Path to a TOML configuration file. │
│ --print-args Print the command line arguments, │
│ including variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: A simple tool to turn your speech into text.
Workflow: This agent listens to your microphone and converts your speech to text in real-time.
- Run the command. It will start listening immediately.
- Speak into your microphone.
- Press
Ctrl+Cto stop recording. - The transcribed text is copied to your clipboard.
- Optionally, use the
--llmflag to have an Ollama model clean up the raw transcript (fixing punctuation, etc.).
How to Use It:
- Simple Transcription:
agent-cli transcribe --input-device-index 1 - With LLM Cleanup:
agent-cli transcribe --input-device-index 1 --llm
See the output of agent-cli transcribe --help
Usage: agent-cli transcribe [OPTIONS]
Record audio from microphone and transcribe to text.
Records until you press Ctrl+C (or send SIGINT), then transcribes using your configured
ASR provider. The transcript is copied to the clipboard by default.
With --llm: Passes the raw transcript through an LLM to clean up speech recognition
errors, add punctuation, remove filler words, and improve readability.
With --toggle: Bind to a hotkey for push-to-talk. First call starts recording, second
call stops and transcribes.
Examples:
• Record and transcribe: agent-cli transcribe
• With LLM cleanup: agent-cli transcribe --llm
• Re-transcribe last recording: agent-cli transcribe --last-recording 1
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM Configuration ────────────────────────────────────────────────────────────────────╮
│ --extra-instructions TEXT Extra instructions appended to the LLM │
│ cleanup prompt (requires --llm). │
│ --llm --no-llm Clean up transcript with LLM: fix errors, │
│ add punctuation, remove filler words. Uses │
│ --extra-instructions if set (via CLI or │
│ config file). │
│ [default: no-llm] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Recovery ───────────────────────────────────────────────────────────────────────╮
│ --from-file PATH Transcribe from audio file instead │
│ of microphone. Supports wav, mp3, │
│ m4a, ogg, flac, aac, webm. │
│ Requires ffmpeg for non-WAV │
│ formats with Wyoming. │
│ --last-recording INTEGER Re-transcribe a saved recording │
│ (1=most recent, 2=second-to-last, │
│ etc). Useful after connection │
│ failures or to retry with │
│ different options. │
│ [default: 0] │
│ --save-recording --no-save-recording Save recordings to │
│ ~/.cache/agent-cli/ for │
│ --last-recording recovery. │
│ [default: save-recording] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
│ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
│ [env var: ASR_PROVIDER] │
│ [default: wyoming] │
│ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
│ [env var: LLM_PROVIDER] │
│ [default: ollama] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
│ --input-device-index INTEGER Audio input device index (see --list-devices). │
│ Uses system default if omitted. │
│ --input-device-name TEXT Select input device by name substring (e.g., │
│ MacBook or USB). │
│ --list-devices List available audio devices with their indices │
│ and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
│ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
│ [env var: ASR_WYOMING_IP] │
│ [default: localhost] │
│ --asr-wyoming-port INTEGER Wyoming ASR server port. │
│ [env var: ASR_WYOMING_PORT] │
│ [default: 10300] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
│ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
│ [env var: ASR_OPENAI_MODEL] │
│ [default: whisper-1] │
│ --asr-openai-base-url TEXT Custom base URL for OpenAI-compatible ASR API │
│ (e.g., for custom Whisper server: │
│ http://localhost:9898). │
│ [env var: ASR_OPENAI_BASE_URL] │
│ --asr-openai-prompt TEXT Custom prompt to guide transcription (optional). │
│ [env var: ASR_OPENAI_PROMPT] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
│ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
│ [env var: ASR_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
│ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
│ [env var: LLM_OLLAMA_MODEL] │
│ [default: gemma3:4b] │
│ --llm-ollama-host TEXT The Ollama server host. Default is │
│ http://localhost:11434. │
│ [env var: LLM_OLLAMA_HOST] │
│ [default: http://localhost:11434] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
│ [env var: LLM_OPENAI_MODEL] │
│ [default: gpt-5-mini] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
│ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
│ [env var: LLM_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
│ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
│ GEMINI_API_KEY environment variable. │
│ [env var: GEMINI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Process Management ───────────────────────────────────────────────────────────────────╮
│ --stop Stop any running instance of this command. │
│ --status Check if an instance is currently running. │
│ --toggle Start if not running, stop if running. Ideal for hotkey binding. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --clipboard --no-clipboard Copy result to │
│ clipboard. │
│ [default: clipboard] │
│ --log-level [debug|info|warning| Set logging level. │
│ error] [env var: LOG_LEVEL] │
│ [default: warning] │
│ --log-file TEXT Path to a file to │
│ write logs to. │
│ --quiet -q Suppress console │
│ output from rich. │
│ --json Output result as JSON │
│ (implies --quiet and │
│ --no-clipboard). │
│ --config TEXT Path to a TOML │
│ configuration file. │
│ --print-args Print the command │
│ line arguments, │
│ including variables │
│ taken from the │
│ configuration file. │
│ --transcription-log PATH Append transcripts to │
│ JSONL file │
│ (timestamp, hostname, │
│ model, raw/processed │
│ text). Recent entries │
│ provide context for │
│ LLM cleanup. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: A continuous background transcription service that automatically detects and transcribes speech.
Workflow: Runs as a daemon, listening to your microphone and automatically segmenting speech using voice activity detection (VAD).
- Run the command. It starts listening immediately.
- Speak naturally - the daemon detects when you start and stop speaking.
- Each speech segment is automatically transcribed and logged.
- Optionally, audio is saved as MP3 files for later reference.
- Press
Ctrl+Cto stop the daemon.
Installation: Requires the vad extra:
uv tool install "agent-cli[vad]" -p 3.13How to Use It:
- Basic Daemon:
agent-cli transcribe-live - With Custom Role:
agent-cli transcribe-live --role meeting - With LLM Cleanup:
agent-cli transcribe-live --llm - Custom Silence Threshold:
agent-cli transcribe-live --silence-threshold 1.5
Output Files:
- Transcription Log:
~/.config/agent-cli/transcriptions.jsonl(JSON Lines format) - Audio Files:
~/.config/agent-cli/audio/YYYY/MM/DD/*.mp3
See the output of agent-cli transcribe-live --help
Usage: agent-cli transcribe-live [OPTIONS]
Continuous live transcription using Silero VAD for speech detection.
Unlike transcribe (single recording session), this runs indefinitely and automatically
detects speech segments using Voice Activity Detection (VAD). Each detected segment is
transcribed and logged with timestamps.
How it works:
1 Listens continuously to microphone input
2 Silero VAD detects when you start/stop speaking
3 After --silence-threshold seconds of silence, the segment is finalized
4 Segment is transcribed (and optionally cleaned by LLM with --llm)
5 Results are appended to the JSONL log file
6 Audio is saved as MP3 if --save-audio is enabled (requires ffmpeg)
Use cases: Meeting transcription, note-taking, voice journaling, accessibility.
Examples:
agent-cli transcribe-live
agent-cli transcribe-live --role meeting --silence-threshold 1.5
agent-cli transcribe-live --llm --clipboard --role notes
agent-cli transcribe-live --transcription-log ~/meeting.jsonl --no-save-audio
agent-cli transcribe-live --asr-provider openai --llm-provider gemini --llm
Tips:
• Use --role to tag entries (e.g., speaker1, meeting, personal)
• Adjust --vad-threshold if detection is too sensitive (increase) or missing speech
(decrease)
• Use --stop to cleanly terminate a running process
• With --llm, transcripts are cleaned up (punctuation, filler words removed)
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --role -r TEXT Label for log entries. Use to │
│ distinguish speakers or contexts in │
│ logs. │
│ [default: user] │
│ --silence-threshold -s FLOAT Seconds of silence after speech to │
│ finalize a segment. Increase for │
│ slower speakers. │
│ [default: 1.0] │
│ --min-segment -m FLOAT Minimum seconds of speech required │
│ before a segment is processed. │
│ Filters brief sounds. │
│ [default: 0.25] │
│ --vad-threshold FLOAT Silero VAD confidence threshold │
│ (0.0-1.0). Higher values require │
│ clearer speech; lower values are │
│ more sensitive to quiet/distant │
│ voices. │
│ [default: 0.3] │
│ --save-audio --no-save-audio Save each speech segment as MP3. │
│ Requires ffmpeg to be installed. │
│ [default: save-audio] │
│ --audio-dir PATH Base directory for MP3 files. Files │
│ are organized by date: │
│ YYYY/MM/DD/HHMMSS_mmm.mp3. Default: │
│ ~/.config/agent-cli/audio. │
│ --transcription-log -t PATH JSONL file for transcript logging │
│ (one JSON object per line with │
│ timestamp, role, raw/processed │
│ text, audio path). Default: │
│ ~/.config/agent-cli/transcriptions… │
│ --clipboard --no-clipboard Copy each completed transcription │
│ to clipboard (overwrites previous). │
│ Useful with --llm to get cleaned │
│ text. │
│ [default: no-clipboard] │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
│ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
│ [env var: ASR_PROVIDER] │
│ [default: wyoming] │
│ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
│ [env var: LLM_PROVIDER] │
│ [default: ollama] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
│ --input-device-index INTEGER Audio input device index (see --list-devices). │
│ Uses system default if omitted. │
│ --input-device-name TEXT Select input device by name substring (e.g., │
│ MacBook or USB). │
│ --list-devices List available audio devices with their indices │
│ and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
│ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
│ [env var: ASR_WYOMING_IP] │
│ [default: localhost] │
│ --asr-wyoming-port INTEGER Wyoming ASR server port. │
│ [env var: ASR_WYOMING_PORT] │
│ [default: 10300] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
│ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
│ [env var: ASR_OPENAI_MODEL] │
│ [default: whisper-1] │
│ --asr-openai-base-url TEXT Custom base URL for OpenAI-compatible ASR API │
│ (e.g., for custom Whisper server: │
│ http://localhost:9898). │
│ [env var: ASR_OPENAI_BASE_URL] │
│ --asr-openai-prompt TEXT Custom prompt to guide transcription (optional). │
│ [env var: ASR_OPENAI_PROMPT] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
│ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
│ [env var: ASR_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
│ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
│ [env var: LLM_OLLAMA_MODEL] │
│ [default: gemma3:4b] │
│ --llm-ollama-host TEXT The Ollama server host. Default is │
│ http://localhost:11434. │
│ [env var: LLM_OLLAMA_HOST] │
│ [default: http://localhost:11434] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
│ [env var: LLM_OPENAI_MODEL] │
│ [default: gpt-5-mini] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
│ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
│ [env var: LLM_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
│ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
│ GEMINI_API_KEY environment variable. │
│ [env var: GEMINI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM Configuration ────────────────────────────────────────────────────────────────────╮
│ --llm --no-llm Clean up transcript with LLM: fix errors, add punctuation, │
│ remove filler words. Uses --extra-instructions if set (via CLI │
│ or config file). │
│ [default: no-llm] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Process Management ───────────────────────────────────────────────────────────────────╮
│ --stop Stop any running instance of this command. │
│ --status Check if an instance is currently running. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --log-level [debug|info|warning|error] Set logging level. │
│ [env var: LOG_LEVEL] │
│ [default: warning] │
│ --log-file TEXT Path to a file to write logs to. │
│ --quiet -q Suppress console output from rich. │
│ --config TEXT Path to a TOML configuration file. │
│ --print-args Print the command line arguments, │
│ including variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: Reads any text out loud.
Workflow: A straightforward text-to-speech utility.
- It takes text from a command-line argument or your clipboard.
- It sends the text to a Wyoming TTS server (like Piper).
- The generated audio is played through your default speakers.
How to Use It:
- Speak from Argument:
agent-cli speak "Hello, world!" - Speak from Clipboard:
agent-cli speak - Save to File:
agent-cli speak "Hello" --save-file hello.wav
See the output of agent-cli speak --help
Usage: agent-cli speak [OPTIONS] [TEXT]
Convert text to speech and play audio through speakers.
By default, synthesized audio plays immediately. Use --save-file to save to a WAV file
instead (skips playback).
Text can be provided as an argument or read from clipboard automatically.
Examples:
Speak text directly: agent-cli speak "Hello, world!"
Speak clipboard contents: agent-cli speak
Save to file instead of playing: agent-cli speak "Hello" --save-file greeting.wav
Use OpenAI-compatible TTS: agent-cli speak "Hello" --tts-provider openai
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ text [TEXT] Text to synthesize. If not provided, reads from clipboard. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
│ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
│ 'gemini'). │
│ [env var: TTS_PROVIDER] │
│ [default: wyoming] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
│ --output-device-index INTEGER Audio output device index (see --list-devices │
│ for available devices). │
│ --output-device-name TEXT Partial match on device name (e.g., 'speakers', │
│ 'headphones'). │
│ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, 2.0 = │
│ twice as fast, 0.5 = half speed). │
│ [default: 1.0] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Wyoming ────────────────────────────────────────────────────────────────╮
│ --tts-wyoming-ip TEXT Wyoming TTS server IP address. │
│ [default: localhost] │
│ --tts-wyoming-port INTEGER Wyoming TTS server port. │
│ [default: 10200] │
│ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., │
│ 'en_US-lessac-medium'). │
│ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). │
│ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
│ --tts-openai-model TEXT The OpenAI model to use for TTS. │
│ [default: tts-1] │
│ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx, │
│ nova, shimmer). │
│ [default: alloy] │
│ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
│ (e.g., http://localhost:8000/v1 for a proxy). │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Kokoro ─────────────────────────────────────────────────────────────────╮
│ --tts-kokoro-model TEXT The Kokoro model to use for TTS. │
│ [default: kokoro] │
│ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. │
│ [default: af_sky] │
│ --tts-kokoro-host TEXT The base URL for the Kokoro API. │
│ [default: http://localhost:8880/v1] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Gemini ─────────────────────────────────────────────────────────────────╮
│ --tts-gemini-model TEXT The Gemini model to use for TTS. │
│ [default: gemini-2.5-flash-preview-tts] │
│ --tts-gemini-voice TEXT The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', │
│ 'Charon', 'Fenrir'). │
│ [default: Kore] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
│ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
│ GEMINI_API_KEY environment variable. │
│ [env var: GEMINI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
│ --list-devices List available audio devices with their indices and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --save-file PATH Save audio to WAV file instead of │
│ playing through speakers. │
│ --log-level [debug|info|warning|error] Set logging level. │
│ [env var: LOG_LEVEL] │
│ [default: warning] │
│ --log-file TEXT Path to a file to write logs to. │
│ --quiet -q Suppress console output from rich. │
│ --json Output result as JSON (implies │
│ --quiet and --no-clipboard). │
│ --config TEXT Path to a TOML configuration file. │
│ --print-args Print the command line arguments, │
│ including variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Process Management ───────────────────────────────────────────────────────────────────╮
│ --stop Stop any running instance of this command. │
│ --status Check if an instance is currently running. │
│ --toggle Start if not running, stop if running. Ideal for hotkey binding. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: A powerful clipboard assistant that you command with your voice.
Workflow: This agent is designed for a hotkey-driven workflow to act on text you've already copied.
- Copy a block of text to your clipboard (e.g., an email draft).
- Press a hotkey to run
agent-cli voice-edit &in the background. The agent is now listening. - Speak a command, such as "Make this more formal" or "Summarize the key points."
- Press the same hotkey again, which should trigger
agent-cli voice-edit --stop. - The agent transcribes your command, sends it along with the original clipboard text to the LLM, and the LLM performs the action.
- The result is copied back to your clipboard. If
--ttsis enabled, it will also speak the result.
How to Use It: The power of this tool is unlocked with a hotkey manager like Keyboard Maestro (macOS) or AutoHotkey (Windows). See the docstring in agent_cli/agents/voice_edit.py for a detailed Keyboard Maestro setup guide.
See the output of agent-cli voice-edit --help
Usage: agent-cli voice-edit [OPTIONS]
Edit or query clipboard text using voice commands.
Workflow: Captures clipboard text → records your voice command → transcribes it → sends
both to an LLM → copies result back to clipboard.
Use this for hands-free text editing (e.g., "make this more formal") or asking questions
about clipboard content (e.g., "summarize this").
Typical hotkey integration: Run voice-edit & on keypress to start recording, then send
SIGINT (via --stop) on second keypress to process.
Examples:
• Basic usage: agent-cli voice-edit
• With TTS response: agent-cli voice-edit --tts
• Toggle on/off: agent-cli voice-edit --toggle
• List audio devices: agent-cli voice-edit --list-devices
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
│ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
│ [env var: ASR_PROVIDER] │
│ [default: wyoming] │
│ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
│ [env var: LLM_PROVIDER] │
│ [default: ollama] │
│ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
│ 'gemini'). │
│ [env var: TTS_PROVIDER] │
│ [default: wyoming] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
│ --input-device-index INTEGER Audio input device index (see --list-devices). │
│ Uses system default if omitted. │
│ --input-device-name TEXT Select input device by name substring (e.g., │
│ MacBook or USB). │
│ --list-devices List available audio devices with their indices │
│ and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
│ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
│ [env var: ASR_WYOMING_IP] │
│ [default: localhost] │
│ --asr-wyoming-port INTEGER Wyoming ASR server port. │
│ [env var: ASR_WYOMING_PORT] │
│ [default: 10300] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
│ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
│ [env var: ASR_OPENAI_MODEL] │
│ [default: whisper-1] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
│ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
│ [env var: ASR_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
│ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
│ [env var: LLM_OLLAMA_MODEL] │
│ [default: gemma3:4b] │
│ --llm-ollama-host TEXT The Ollama server host. Default is │
│ http://localhost:11434. │
│ [env var: LLM_OLLAMA_HOST] │
│ [default: http://localhost:11434] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
│ [env var: LLM_OPENAI_MODEL] │
│ [default: gpt-5-mini] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
│ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
│ [env var: LLM_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
│ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
│ GEMINI_API_KEY environment variable. │
│ [env var: GEMINI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
│ --tts --no-tts Enable text-to-speech for responses. │
│ [default: no-tts] │
│ --output-device-index INTEGER Audio output device index (see │
│ --list-devices for available devices). │
│ --output-device-name TEXT Partial match on device name (e.g., │
│ 'speakers', 'headphones'). │
│ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, │
│ 2.0 = twice as fast, 0.5 = half speed). │
│ [default: 1.0] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Wyoming ────────────────────────────────────────────────────────────────╮
│ --tts-wyoming-ip TEXT Wyoming TTS server IP address. │
│ [default: localhost] │
│ --tts-wyoming-port INTEGER Wyoming TTS server port. │
│ [default: 10200] │
│ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., │
│ 'en_US-lessac-medium'). │
│ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). │
│ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
│ --tts-openai-model TEXT The OpenAI model to use for TTS. │
│ [default: tts-1] │
│ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx, │
│ nova, shimmer). │
│ [default: alloy] │
│ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
│ (e.g., http://localhost:8000/v1 for a proxy). │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Kokoro ─────────────────────────────────────────────────────────────────╮
│ --tts-kokoro-model TEXT The Kokoro model to use for TTS. │
│ [default: kokoro] │
│ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. │
│ [default: af_sky] │
│ --tts-kokoro-host TEXT The base URL for the Kokoro API. │
│ [default: http://localhost:8880/v1] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Gemini ─────────────────────────────────────────────────────────────────╮
│ --tts-gemini-model TEXT The Gemini model to use for TTS. │
│ [default: gemini-2.5-flash-preview-tts] │
│ --tts-gemini-voice TEXT The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', │
│ 'Charon', 'Fenrir'). │
│ [default: Kore] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Process Management ───────────────────────────────────────────────────────────────────╮
│ --stop Stop any running instance of this command. │
│ --status Check if an instance is currently running. │
│ --toggle Start if not running, stop if running. Ideal for hotkey binding. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --save-file PATH Save audio to WAV file │
│ instead of playing │
│ through speakers. │
│ --clipboard --no-clipboard Copy result to │
│ clipboard. │
│ [default: clipboard] │
│ --log-level [debug|info|warning|erro Set logging level. │
│ r] [env var: LOG_LEVEL] │
│ [default: warning] │
│ --log-file TEXT Path to a file to write │
│ logs to. │
│ --quiet -q Suppress console output │
│ from rich. │
│ --json Output result as JSON │
│ (implies --quiet and │
│ --no-clipboard). │
│ --config TEXT Path to a TOML │
│ configuration file. │
│ --print-args Print the command line │
│ arguments, including │
│ variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: A hands-free voice assistant that starts and stops recording based on a wake word.
Workflow: This agent continuously listens for a wake word (e.g., "Hey Nabu").
- Run the
assistantcommand. It will start listening for the wake word. - Say the wake word to start recording.
- Speak your command or question.
- Say the wake word again to stop recording.
- The agent transcribes your speech, sends it to the LLM, and gets a response.
- The agent speaks the response back to you and then immediately starts listening for the wake word again.
How to Use It:
- Start the agent:
agent-cli assistant --wake-word "ok_nabu" --input-device-index 1 - With TTS:
agent-cli assistant --wake-word "ok_nabu" --tts --tts-wyoming-voice "en_US-lessac-medium"
See the output of agent-cli assistant --help
Usage: agent-cli assistant [OPTIONS]
Hands-free voice assistant using wake word detection.
Continuously listens for a wake word, then records your speech until you say the wake
word again. The recording is transcribed and sent to an LLM for a conversational
response, optionally spoken back via TTS.
Conversation flow:
1 Say wake word → starts recording
2 Speak your question/command
3 Say wake word again → stops recording and processes
The assistant runs in a loop, ready for the next command after each response. Stop with
Ctrl+C or --stop.
Requirements:
• Wyoming wake word server (e.g., wyoming-openwakeword on port 10400)
• Wyoming ASR server (e.g., wyoming-whisper on port 10300)
• Optional: TTS server for spoken responses (enable with --tts)
Example: assistant --wake-word ok_nabu --tts --input-device-name USB
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
│ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
│ [env var: ASR_PROVIDER] │
│ [default: wyoming] │
│ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
│ [env var: LLM_PROVIDER] │
│ [default: ollama] │
│ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
│ 'gemini'). │
│ [env var: TTS_PROVIDER] │
│ [default: wyoming] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Wake Word ────────────────────────────────────────────────────────────────────────────╮
│ --wake-server-ip TEXT Wyoming wake word server IP (requires │
│ wyoming-openwakeword or similar). │
│ [default: localhost] │
│ --wake-server-port INTEGER Wyoming wake word server port. │
│ [default: 10400] │
│ --wake-word TEXT Wake word to detect. Common options: ok_nabu, │
│ hey_jarvis, alexa. Must match a model loaded in │
│ your wake word server. │
│ [default: ok_nabu] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
│ --input-device-index INTEGER Audio input device index (see --list-devices). │
│ Uses system default if omitted. │
│ --input-device-name TEXT Select input device by name substring (e.g., │
│ MacBook or USB). │
│ --list-devices List available audio devices with their indices │
│ and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
│ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
│ [env var: ASR_WYOMING_IP] │
│ [default: localhost] │
│ --asr-wyoming-port INTEGER Wyoming ASR server port. │
│ [env var: ASR_WYOMING_PORT] │
│ [default: 10300] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
│ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
│ [env var: ASR_OPENAI_MODEL] │
│ [default: whisper-1] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
│ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
│ [env var: ASR_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
│ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
│ [env var: LLM_OLLAMA_MODEL] │
│ [default: gemma3:4b] │
│ --llm-ollama-host TEXT The Ollama server host. Default is │
│ http://localhost:11434. │
│ [env var: LLM_OLLAMA_HOST] │
│ [default: http://localhost:11434] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
│ [env var: LLM_OPENAI_MODEL] │
│ [default: gpt-5-mini] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
│ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
│ [env var: LLM_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
│ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
│ GEMINI_API_KEY environment variable. │
│ [env var: GEMINI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
│ --tts --no-tts Enable text-to-speech for responses. │
│ [default: no-tts] │
│ --output-device-index INTEGER Audio output device index (see │
│ --list-devices for available devices). │
│ --output-device-name TEXT Partial match on device name (e.g., │
│ 'speakers', 'headphones'). │
│ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, │
│ 2.0 = twice as fast, 0.5 = half speed). │
│ [default: 1.0] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Wyoming ────────────────────────────────────────────────────────────────╮
│ --tts-wyoming-ip TEXT Wyoming TTS server IP address. │
│ [default: localhost] │
│ --tts-wyoming-port INTEGER Wyoming TTS server port. │
│ [default: 10200] │
│ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., │
│ 'en_US-lessac-medium'). │
│ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). │
│ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
│ --tts-openai-model TEXT The OpenAI model to use for TTS. │
│ [default: tts-1] │
│ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx, │
│ nova, shimmer). │
│ [default: alloy] │
│ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
│ (e.g., http://localhost:8000/v1 for a proxy). │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Kokoro ─────────────────────────────────────────────────────────────────╮
│ --tts-kokoro-model TEXT The Kokoro model to use for TTS. │
│ [default: kokoro] │
│ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. │
│ [default: af_sky] │
│ --tts-kokoro-host TEXT The base URL for the Kokoro API. │
│ [default: http://localhost:8880/v1] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Gemini ─────────────────────────────────────────────────────────────────╮
│ --tts-gemini-model TEXT The Gemini model to use for TTS. │
│ [default: gemini-2.5-flash-preview-tts] │
│ --tts-gemini-voice TEXT The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', │
│ 'Charon', 'Fenrir'). │
│ [default: Kore] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Process Management ───────────────────────────────────────────────────────────────────╮
│ --stop Stop any running instance of this command. │
│ --status Check if an instance is currently running. │
│ --toggle Start if not running, stop if running. Ideal for hotkey binding. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --save-file PATH Save audio to WAV file │
│ instead of playing │
│ through speakers. │
│ --clipboard --no-clipboard Copy result to │
│ clipboard. │
│ [default: clipboard] │
│ --log-level [debug|info|warning|erro Set logging level. │
│ r] [env var: LOG_LEVEL] │
│ [default: warning] │
│ --log-file TEXT Path to a file to write │
│ logs to. │
│ --quiet -q Suppress console output │
│ from rich. │
│ --config TEXT Path to a TOML │
│ configuration file. │
│ --print-args Print the command line │
│ arguments, including │
│ variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: A full-featured, conversational AI assistant that can interact with your system.
Workflow: This is a persistent, conversational agent that you can have a conversation with.
- Run the
chatcommand. It will start listening for your voice. - Speak your command or question (e.g., "What's in my current directory?").
- The agent transcribes your speech, sends it to the LLM, and gets a response. The LLM can use tools like
read_fileorexecute_codeto answer your question. - The agent speaks the response back to you and then immediately starts listening for your next command.
- The conversation continues in this loop. Conversation history is saved between sessions.
Interaction Model:
- To Interrupt: Press
Ctrl+Conce to stop the agent from either listening or speaking, and it will immediately return to a listening state for a new command. This is useful if it misunderstands you or you want to speak again quickly. - To Exit: Press
Ctrl+Ctwice in a row to terminate the application.
How to Use It:
- Start the agent:
agent-cli chat --input-device-index 1 --tts - Have a conversation:
- You: "Read the pyproject.toml file and tell me the project version."
- AI: (Reads file) "The project version is 0.1.0."
- You: "Thanks!"
See the output of agent-cli chat --help
Usage: agent-cli chat [OPTIONS]
Voice-based conversational chat agent with memory and tools.
Runs an interactive loop: listen → transcribe → LLM → speak response. Conversation
history is persisted and included as context for continuity.
Built-in tools (LLM uses automatically when relevant):
• add_memory/search_memory/update_memory - persistent long-term memory
• duckduckgo_search - web search for current information
• read_file/execute_code - file access and shell commands
Process management: Use --toggle to start/stop via hotkey (bind to a keyboard shortcut),
--stop to terminate, or --status to check state.
Examples:
Use OpenAI-compatible providers for speech and LLM, with TTS enabled:
agent-cli chat --asr-provider openai --llm-provider openai --tts
Start in background mode (toggle on/off with hotkey):
agent-cli chat --toggle
Use local Ollama LLM with Wyoming ASR:
agent-cli chat --llm-provider ollama
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Provider Selection ───────────────────────────────────────────────────────────────────╮
│ --asr-provider TEXT The ASR provider to use ('wyoming', 'openai', 'gemini'). │
│ [env var: ASR_PROVIDER] │
│ [default: wyoming] │
│ --llm-provider TEXT The LLM provider to use ('ollama', 'openai', 'gemini'). │
│ [env var: LLM_PROVIDER] │
│ [default: ollama] │
│ --tts-provider TEXT The TTS provider to use ('wyoming', 'openai', 'kokoro', │
│ 'gemini'). │
│ [env var: TTS_PROVIDER] │
│ [default: wyoming] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input ──────────────────────────────────────────────────────────────────────────╮
│ --input-device-index INTEGER Audio input device index (see --list-devices). │
│ Uses system default if omitted. │
│ --input-device-name TEXT Select input device by name substring (e.g., │
│ MacBook or USB). │
│ --list-devices List available audio devices with their indices │
│ and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Wyoming ─────────────────────────────────────────────────────────────────╮
│ --asr-wyoming-ip TEXT Wyoming ASR server IP address. │
│ [env var: ASR_WYOMING_IP] │
│ [default: localhost] │
│ --asr-wyoming-port INTEGER Wyoming ASR server port. │
│ [env var: ASR_WYOMING_PORT] │
│ [default: 10300] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: OpenAI-compatible ───────────────────────────────────────────────────────╮
│ --asr-openai-model TEXT The OpenAI model to use for ASR (transcription). │
│ [env var: ASR_OPENAI_MODEL] │
│ [default: whisper-1] │
│ --asr-openai-base-url TEXT Custom base URL for OpenAI-compatible ASR API │
│ (e.g., for custom Whisper server: │
│ http://localhost:9898). │
│ [env var: ASR_OPENAI_BASE_URL] │
│ --asr-openai-prompt TEXT Custom prompt to guide transcription (optional). │
│ [env var: ASR_OPENAI_PROMPT] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Input: Gemini ──────────────────────────────────────────────────────────────────╮
│ --asr-gemini-model TEXT The Gemini model to use for ASR (transcription). │
│ [env var: ASR_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Ollama ──────────────────────────────────────────────────────────────────────────╮
│ --llm-ollama-model TEXT The Ollama model to use. Default is gemma3:4b. │
│ [env var: LLM_OLLAMA_MODEL] │
│ [default: gemma3:4b] │
│ --llm-ollama-host TEXT The Ollama server host. Default is │
│ http://localhost:11434. │
│ [env var: LLM_OLLAMA_HOST] │
│ [default: http://localhost:11434] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --llm-openai-model TEXT The OpenAI model to use for LLM tasks. │
│ [env var: LLM_OPENAI_MODEL] │
│ [default: gpt-5-mini] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: Gemini ──────────────────────────────────────────────────────────────────────────╮
│ --llm-gemini-model TEXT The Gemini model to use for LLM tasks. │
│ [env var: LLM_GEMINI_MODEL] │
│ [default: gemini-3-flash-preview] │
│ --gemini-api-key TEXT Your Gemini API key. Can also be set with the │
│ GEMINI_API_KEY environment variable. │
│ [env var: GEMINI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output ─────────────────────────────────────────────────────────────────────────╮
│ --tts --no-tts Enable text-to-speech for responses. │
│ [default: no-tts] │
│ --output-device-index INTEGER Audio output device index (see │
│ --list-devices for available devices). │
│ --output-device-name TEXT Partial match on device name (e.g., │
│ 'speakers', 'headphones'). │
│ --tts-speed FLOAT Speech speed multiplier (1.0 = normal, │
│ 2.0 = twice as fast, 0.5 = half speed). │
│ [default: 1.0] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Wyoming ────────────────────────────────────────────────────────────────╮
│ --tts-wyoming-ip TEXT Wyoming TTS server IP address. │
│ [default: localhost] │
│ --tts-wyoming-port INTEGER Wyoming TTS server port. │
│ [default: 10200] │
│ --tts-wyoming-voice TEXT Voice name to use for Wyoming TTS (e.g., │
│ 'en_US-lessac-medium'). │
│ --tts-wyoming-language TEXT Language for Wyoming TTS (e.g., 'en_US'). │
│ --tts-wyoming-speaker TEXT Speaker name for Wyoming TTS voice. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: OpenAI-compatible ──────────────────────────────────────────────────────╮
│ --tts-openai-model TEXT The OpenAI model to use for TTS. │
│ [default: tts-1] │
│ --tts-openai-voice TEXT Voice for OpenAI TTS (alloy, echo, fable, onyx, │
│ nova, shimmer). │
│ [default: alloy] │
│ --tts-openai-base-url TEXT Custom base URL for OpenAI-compatible TTS API │
│ (e.g., http://localhost:8000/v1 for a proxy). │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Kokoro ─────────────────────────────────────────────────────────────────╮
│ --tts-kokoro-model TEXT The Kokoro model to use for TTS. │
│ [default: kokoro] │
│ --tts-kokoro-voice TEXT The voice to use for Kokoro TTS. │
│ [default: af_sky] │
│ --tts-kokoro-host TEXT The base URL for the Kokoro API. │
│ [default: http://localhost:8880/v1] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Audio Output: Gemini ─────────────────────────────────────────────────────────────────╮
│ --tts-gemini-model TEXT The Gemini model to use for TTS. │
│ [default: gemini-2.5-flash-preview-tts] │
│ --tts-gemini-voice TEXT The voice to use for Gemini TTS (e.g., 'Kore', 'Puck', │
│ 'Charon', 'Fenrir'). │
│ [default: Kore] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Process Management ───────────────────────────────────────────────────────────────────╮
│ --stop Stop any running instance of this command. │
│ --status Check if an instance is currently running. │
│ --toggle Start if not running, stop if running. Ideal for hotkey binding. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ History Options ──────────────────────────────────────────────────────────────────────╮
│ --history-dir PATH Directory for conversation history and long-term │
│ memory. Both conversation.json and │
│ long_term_memory.json are stored here. │
│ [default: ~/.config/agent-cli/history] │
│ --last-n-messages INTEGER Number of past messages to include as context for │
│ the LLM. Set to 0 to start fresh each session │
│ (memory tools still persist). │
│ [default: 50] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --save-file PATH Save audio to WAV file instead of │
│ playing through speakers. │
│ --log-level [debug|info|warning|error] Set logging level. │
│ [env var: LOG_LEVEL] │
│ [default: warning] │
│ --log-file TEXT Path to a file to write logs to. │
│ --quiet -q Suppress console output from rich. │
│ --config TEXT Path to a TOML configuration file. │
│ --print-args Print the command line arguments, │
│ including variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: Enables "Chat with your Data" by running a local proxy server that injects document context into LLM requests.
Workflow:
- Start the server, pointing it to your documents folder and your local LLM (e.g., Ollama or llama.cpp) or OpenAI.
- The server watches the folder and automatically indexes any text/markdown/PDF files into a local ChromaDB vector store.
- Point any OpenAI-compatible client (including
agent-cli chat) to this server's URL. - When you ask a question, the server retrieves relevant document chunks, adds them to the prompt, and forwards it to the LLM.
How to Use It:
- Install RAG deps first:
pip install "agent-cli[rag]"(or, from the repo,uv sync --extra rag) - Start Server (Local LLM):
agent-cli rag-proxy --docs-folder ~/Documents/Notes --openai-base-url http://localhost:11434/v1 --port 8000 - Start Server (OpenAI):
agent-cli rag-proxy --docs-folder ~/Documents/Notes --openai-api-key sk-... - Use with Agent-CLI:
agent-cli chat --openai-base-url http://localhost:8000/v1 --llm-provider openai
See the output of agent-cli rag-proxy --help
Usage: agent-cli rag-proxy [OPTIONS]
Start a RAG proxy server that enables "chat with your documents".
Watches a folder for documents, indexes them into a vector store, and provides an
OpenAI-compatible API at /v1/chat/completions. When you send a chat request, the server
retrieves relevant document chunks and injects them as context before forwarding to your
LLM backend.
Quick start:
• agent-cli rag-proxy — Start with defaults (./rag_docs, OpenAI-compatible API)
• agent-cli rag-proxy --docs-folder ~/notes — Index your notes folder
How it works:
1 Documents in --docs-folder are chunked, embedded, and stored in ChromaDB
2 A file watcher auto-reindexes when files change
3 Chat requests trigger a semantic search for relevant chunks
4 Retrieved context is injected into the prompt before forwarding to the LLM
5 Responses include a rag_sources field listing which documents were used
Supported file formats:
Text: .txt, .md, .json, .py, .js, .ts, .yaml, .toml, .rst, etc. Rich documents (via
MarkItDown): .pdf, .docx, .pptx, .xlsx, .html, .csv
API endpoints:
• POST /v1/chat/completions — Main chat endpoint (OpenAI-compatible)
• GET /health — Health check with configuration info
• GET /files — List indexed files with chunk counts
• POST /reindex — Trigger manual reindex
• All other paths are proxied to the LLM backend
Per-request overrides (in JSON body):
• rag_top_k: Override --limit for this request
• rag_enable_tools: Override --rag-tools for this request
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ RAG Configuration ────────────────────────────────────────────────────────────────────╮
│ --docs-folder PATH Folder to watch for documents. Files are │
│ auto-indexed on startup and when changed. │
│ Must not overlap with --chroma-path. │
│ [default: ./rag_docs] │
│ --chroma-path PATH ChromaDB storage directory for vector │
│ embeddings. Must be separate from │
│ --docs-folder to avoid indexing database │
│ files. │
│ [default: ./rag_db] │
│ --limit INTEGER Number of document chunks to retrieve per │
│ query. Higher values provide more context │
│ but use more tokens. Can be overridden │
│ per-request via rag_top_k in the JSON │
│ body. │
│ [default: 3] │
│ --rag-tools --no-rag-tools Enable read_full_document() tool so the │
│ LLM can request full document content when │
│ retrieved snippets are insufficient. Can │
│ be overridden per-request via │
│ rag_enable_tools in the JSON body. │
│ [default: rag-tools] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM Configuration ────────────────────────────────────────────────────────────────────╮
│ --embedding-base-url TEXT Base URL for embedding API. Falls back to │
│ --openai-base-url if not set. Useful when using │
│ different providers for chat vs embeddings. │
│ [env var: EMBEDDING_BASE_URL] │
│ --embedding-model TEXT Embedding model to use for vectorization. │
│ [default: text-embedding-3-small] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Server Configuration ─────────────────────────────────────────────────────────────────╮
│ --host TEXT Host/IP to bind API servers to. │
│ [default: 0.0.0.0] │
│ --port INTEGER Port for the RAG proxy API (e.g., │
│ http://localhost:8000/v1/chat/completions). │
│ [default: 8000] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --log-level [debug|info|warning|error] Set logging level. │
│ [env var: LOG_LEVEL] │
│ [default: info] │
│ --config TEXT Path to a TOML configuration file. │
│ --print-args Print the command line arguments, │
│ including variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
The memory proxy command is the core feature—a middleware server that gives any OpenAI-compatible app long-term memory. Additional subcommands (memory add, etc.) help manage the memory store directly.
Purpose: Adds long-term conversational memory (self-hosted) to any OpenAI-compatible client.
Key Features:
- Simple Markdown Files: Your memories are stored as human-readable Markdown files, serving as the ultimate source of truth.
- Automatic Version Control: Built-in Git integration automatically commits changes, giving you a full history of your memory's evolution.
- Lightweight & Local: Minimal dependencies and runs entirely on your machine.
- Proxy Middleware: Works transparently with any OpenAI-compatible
/chat/completionsendpoint (OpenAI, Ollama, vLLM).
Workflow:
- Stores a per-conversation memory collection in Chroma with the same embedding settings as
rag-proxy, reranked with a cross-encoder. - For each turn, retrieves the top-k relevant memories (conversation + global) plus a rolling summary and augments the prompt.
- After each reply, extracts salient facts and refreshes the running summary (disable with
--no-summarization). - Enforces a per-conversation cap (
--max-entries, default 500) and evicts oldest memories first.
How to Use It:
- Install memory deps first:
pip install "agent-cli[memory]"(or, from the repo,uv sync --extra memory) - Start Server (Local LLM/OpenAI-compatible):
agent-cli memory proxy --memory-path ./memory_db --openai-base-url http://localhost:11434/v1 --embedding-model embeddinggemma:300m - Use with Agent-CLI:
agent-cli chat --openai-base-url http://localhost:8100/v1 --llm-provider openai
See the output of agent-cli memory proxy --help
Usage: agent-cli memory proxy [OPTIONS]
Start the memory-backed chat proxy server.
This server acts as a middleware between your chat client (e.g., a web UI, CLI, or IDE
plugin) and an OpenAI-compatible LLM provider (e.g., OpenAI, Ollama, vLLM).
Key Features:
• Simple Markdown Files: Memories are stored as human-readable Markdown files, serving
as the ultimate source of truth.
• Automatic Version Control: Built-in Git integration automatically commits changes,
providing a full history of memory evolution.
• Lightweight & Local: Minimal dependencies and runs entirely on your machine.
• Proxy Middleware: Works transparently with any OpenAI-compatible /chat/completions
endpoint.
How it works:
1 Intercepts POST /v1/chat/completions requests.
2 Retrieves relevant memories (facts, previous conversations) from a local vector
database (ChromaDB) based on the user's query.
3 Injects these memories into the system prompt.
4 Forwards the augmented request to the real LLM (--openai-base-url).
5 Extracts new facts from the conversation in the background and updates the long-term
memory store (including handling contradictions).
Example:
# Start proxy pointing to local Ollama
agent-cli memory proxy --openai-base-url http://localhost:11434/v1
# Then configure your chat client to use http://localhost:8100/v1
# as its OpenAI base URL. All requests flow through the memory proxy.
Per-request overrides: Clients can include these fields in the request body: memory_id
(conversation ID), memory_top_k, memory_recency_weight, memory_score_threshold.
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Memory Configuration ─────────────────────────────────────────────────────────────────╮
│ --memory-path PATH Directory for memory storage. │
│ Contains entries/ (Markdown │
│ files) and chroma/ (vector │
│ index). Created automatically if │
│ it doesn't exist. │
│ [default: ./memory_db] │
│ --default-top-k INTEGER Number of relevant memories to │
│ inject into each request. Higher │
│ values provide more context but │
│ increase token usage. │
│ [default: 5] │
│ --max-entries INTEGER Maximum entries per conversation │
│ before oldest are evicted. │
│ Summaries are preserved │
│ separately. │
│ [default: 500] │
│ --mmr-lambda FLOAT MMR lambda (0-1): higher favors │
│ relevance, lower favors │
│ diversity. │
│ [default: 0.7] │
│ --recency-weight FLOAT Weight for recency vs semantic │
│ relevance (0.0-1.0). At 0.2: 20% │
│ recency, 80% semantic similarity. │
│ [default: 0.2] │
│ --score-threshold FLOAT Minimum semantic relevance │
│ threshold (0.0-1.0). Memories │
│ below this score are discarded to │
│ reduce noise. │
│ [default: 0.35] │
│ --summarization --no-summarization Extract facts and generate │
│ summaries after each turn using │
│ the LLM. Disable to only store │
│ raw conversation turns. │
│ [default: summarization] │
│ --git-versioning --no-git-versioning Auto-commit memory changes to │
│ git. Initializes a repo in │
│ --memory-path if needed. Provides │
│ full history of memory evolution. │
│ [default: git-versioning] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM: OpenAI-compatible ───────────────────────────────────────────────────────────────╮
│ --openai-base-url TEXT Custom base URL for OpenAI-compatible API (e.g., for │
│ llama-server: http://localhost:8080/v1). │
│ [env var: OPENAI_BASE_URL] │
│ --openai-api-key TEXT Your OpenAI API key. Can also be set with the │
│ OPENAI_API_KEY environment variable. │
│ [env var: OPENAI_API_KEY] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ LLM Configuration ────────────────────────────────────────────────────────────────────╮
│ --embedding-base-url TEXT Base URL for embedding API. Falls back to │
│ --openai-base-url if not set. Useful when using │
│ different providers for chat vs embeddings. │
│ [env var: EMBEDDING_BASE_URL] │
│ --embedding-model TEXT Embedding model to use for vectorization. │
│ [default: text-embedding-3-small] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Server Configuration ─────────────────────────────────────────────────────────────────╮
│ --host TEXT Host/IP to bind API servers to. │
│ [default: 0.0.0.0] │
│ --port INTEGER Port to bind to │
│ [default: 8100] │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --log-level [debug|info|warning|error] Set logging level. │
│ [env var: LOG_LEVEL] │
│ [default: info] │
│ --config TEXT Path to a TOML configuration file. │
│ --print-args Print the command line arguments, │
│ including variables taken from the │
│ configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
Purpose: Directly add memories to the store without LLM extraction. Useful for bulk imports or seeding memories.
How to Use It:
# Add single memories as arguments
agent-cli memory add "User likes coffee" "User lives in Amsterdam"
# Read from JSON file
agent-cli memory add -f memories.json
# Read from stdin (plain text, one per line)
echo "User prefers dark mode" | agent-cli memory add -f -
# Read JSON from stdin
echo '["Fact one", "Fact two"]' | agent-cli memory add -f -
# Specify conversation ID
agent-cli memory add -c work "Project deadline is Friday"See the output of agent-cli memory add --help
Usage: agent-cli memory add [OPTIONS] [MEMORIES]...
Add memories directly without LLM extraction.
This writes facts directly to the memory store, bypassing the LLM-based fact extraction.
Useful for bulk imports or seeding memories.
The memory proxy file watcher (if running) will auto-index new files. Otherwise, they'll
be indexed on next memory proxy startup.
Examples::
# Add single memories as arguments
agent-cli memory add "User likes coffee" "User lives in Amsterdam"
# Read from JSON file
agent-cli memory add -f memories.json
# Read from stdin (plain text, one per line)
echo "User prefers dark mode" | agent-cli memory add -f -
# Read JSON from stdin
echo '["Fact one", "Fact two"]' | agent-cli memory add -f -
# Specify conversation ID
agent-cli memory add -c work "Project deadline is Friday"
╭─ Arguments ────────────────────────────────────────────────────────────────────────────╮
│ memories [MEMORIES]... Memories to add. Each argument becomes one fact. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ──────────────────────────────────────────────────────────────────────────────╮
│ --file -f PATH Read memories from file. Use '-' │
│ for stdin. Supports JSON array, │
│ JSON object with 'memories' key, │
│ or plain text (one per line). │
│ --conversation-id -c TEXT Conversation namespace for these │
│ memories. Memories are retrieved │
│ per-conversation unless shared │
│ globally. │
│ [default: default] │
│ --memory-path PATH Directory for memory storage (same │
│ as memory proxy --memory-path). │
│ [default: ./memory_db] │
│ --git-versioning --no-git-versioning Auto-commit changes to git for │
│ version history. │
│ [default: git-versioning] │
│ --help -h Show this message and exit. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
╭─ General Options ──────────────────────────────────────────────────────────────────────╮
│ --quiet -q Suppress console output from rich. │
│ --config TEXT Path to a TOML configuration file. │
│ --print-args Print the command line arguments, including variables │
│ taken from the configuration file. │
╰────────────────────────────────────────────────────────────────────────────────────────╯
The project uses pytest for testing. To run tests using uv:
uv run pytestThis project uses pre-commit hooks (ruff for linting and formatting, mypy for type checking) to maintain code quality. To set them up:
-
Install pre-commit:
pip install pre-commit
-
Install the hooks:
pre-commit install
Now, the hooks will run automatically before each commit.
Contributions are welcome! If you find a bug or have a feature request, please open an issue. If you'd like to contribute code, please fork the repository and submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
