Voice-to-markdown transcription pipeline with AI enrichment. Start talking, get structured markdown notes with titles, keywords, and topic categorization.
- Captures microphone audio in real-time
- Transcribes locally using faster-whisper (CTranslate2 port of OpenAI Whisper — runs on CPU, no API keys)
- Writes markdown incrementally with automatic paragraph/section breaks based on speech pauses
- Post-processes via the Anthropic API to generate title, keywords, topic, and summary
- Adds YAML frontmatter and renames file with a descriptive slug
- Optionally indexes into a knowledge base directory for organization
git clone git@github.com:bdfinst/whisp.git && cd whisp
chmod +x install.sh && ./install.shThe installer will:
- Verify Python 3.9+
- Install system audio dependencies (PortAudio)
- Create a Python virtual environment (
.venv) - Install pip dependencies
- Create a
whispsymlink in/usr/local/bin
Or manually:
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Linux (Ubuntu/Debian): sudo apt-get install portaudio19-dev python3-dev
# Linux (Fedora/RHEL): sudo dnf install portaudio-devel python3-devel
# macOS: brew install portaudio (if needed)Create a .env file in the project root:
ANTHROPIC_API_KEY=sk-ant-...
HF_TOKEN=hf_your_token_here
WHISP_OUTPUT_DIR=~/notes/voice
WHISP_INDEXED_NOTES_DIR=/path/to/knowledge-base/source-material
| Variable | Purpose | Default |
|---|---|---|
ANTHROPIC_API_KEY |
Anthropic API key for AI enrichment | (enrichment skipped if unset) |
HF_TOKEN |
HuggingFace token for model downloads | |
WHISP_OUTPUT_DIR |
Directory to write voice notes | ./notes |
WHISP_INDEXED_NOTES_DIR |
Knowledge base directory for indexing | (disabled if unset) |
- Python 3.9+
- PortAudio system library for microphone access
- Anthropic API key (optional) — required for AI enrichment (title, keywords, topic extraction)
# Default: record, transcribe, enrich, and index
whisp
# Custom output directory
whisp -o ~/repos/my-notes/voice
# Better accuracy (slower, ~460MB model download on first run)
whisp --model small.en
# Custom title (skips AI-generated title)
whisp --title "Sprint Retro Notes"
# List available microphones
whisp --list-devices
# Use a specific mic (index from --list-devices)
whisp --device 3
# Disable inline timestamps
whisp --no-timestamps
# Skip AI enrichment (raw transcript only)
whisp --no-post-process
# Skip knowledge base indexing (record now, index later)
whisp --no-index
# Batch index: index all previously recorded notes at once
whisp --index # index enriched notes from default output dir
whisp --index ~/notes/voice # index from a specific directory| Flag | Description | Default |
|---|---|---|
-o, --output-dir |
Directory to write markdown files | from config or ./notes |
-m, --model |
Whisper model size | base.en |
-t, --title |
Custom markdown title | auto-generated |
--device |
Audio input device index | system default |
--list-devices |
List available audio input devices | |
--no-timestamps |
Disable inline timestamps in output | false |
--compute-type |
Model quantization (float32, float16, int8) |
int8 |
--no-post-process |
Skip Claude AI enrichment | false |
--no-index |
Skip copying to knowledge base | false |
--index [DIR] |
Batch-index enriched files from DIR (skips recording) | output dir |
--config |
Path to a specific whisp.yaml config file |
auto-detected |
CLI arguments override config file values.
Whisp loads configuration from whisp.yaml, searching these locations in order:
- Path passed via
--config ./whisp.yaml(current working directory)~/.config/whisp/whisp.yamlwhisp.yamlin the project directory
Example whisp.yaml:
model: base.en
compute_type: int8
device: null
no_timestamps: false
post_process:
enabled: true
claude_model: sonnet
index:
enabled: trueLocal paths (output_dir, source_material_dir) are configured via environment variables in .env rather than the config file, so whisp.yaml can be committed without machine-specific paths.
These constants are defined in voice_to_md.py:
| Constant | Default | Purpose |
|---|---|---|
CHUNK_DURATION_S |
5 | Seconds of audio per transcription batch |
SILENCE_THRESHOLD |
0.01 | RMS level below which audio is "silence" |
PAUSE_BREAK_S |
2.0 | Silence duration that triggers a paragraph break |
LONG_PAUSE_S |
8.0 | Silence duration that triggers a section break |
Files are created as YYYY-MM-DD_HHMMSS.md (or YYYY-MM-DD_HHMMSS_title-slug.md after AI enrichment).
With post-processing enabled, files include YAML frontmatter:
---
title: Improving CI Pipeline Feedback Loops
date: 2026-02-06 14:30
type: voice-note
word_count: 482
duration: 5m 23s
keywords:
- continuous delivery
- feedback loops
- pipeline optimization
primary_topic: continuous-delivery
summary: Discussion of strategies to reduce CI pipeline feedback time.
---
# Voice Notes — 2026-02-06 14:30
> **Recorded**: Thursday, February 06, 2026 at 02:30 PM
> **Model**: base.en
---
First chunk of transcribed speech flows naturally here as continuous text.
Second paragraph appears automatically after a ~2 second pause in speech.
---
*[3m]*
A section break with timestamp appears after longer pauses (~8 seconds),
useful for topic changes.
---
> **Duration**: 5m 23s | **Words**: ~482| Model | Size | Speed | Accuracy | Use case |
|---|---|---|---|---|
tiny.en |
75MB | Fastest | Lower | Quick drafts |
base.en |
140MB | Fast | Good | Default — best balance |
small.en |
460MB | Medium | Better | Important meetings |
medium.en |
1.5GB | Slower | High | When accuracy matters most |
large-v3 |
3GB | Slowest | Highest | Multi-language support |
Models download automatically on first use from HuggingFace Hub. .en variants are English-optimized and faster.
The full whisp pipeline runs four steps:
- Record & Transcribe — Captures audio, transcribes with faster-whisper, writes markdown with smart pause detection
- AI Enrichment — Sends transcript to the Anthropic API to extract title, keywords, topic, and summary; adds YAML frontmatter; renames file with title slug
- Index — Copies the note to a knowledge base directory organized by topic category, and appends a row to
voice-notes-index.md - Finalize — Writes footer with duration and word count
Steps 2 and 3 can be disabled with --no-post-process and --no-index (or via config).
AI enrichment maps voice notes to these topic categories for indexing:
agile-practices, ai-future-development, continuous-delivery-ci, devops-sre, engineering-practices, general, metrics-measurement, security, team-organization-leadership, testing-quality
Point the output at a repo for version-controlled notes:
whisp -o ~/repos/notes/voice
# After a session:
cd ~/repos/notes
git add voice/ && git commit -m "Voice notes $(date +%Y-%m-%d)"Or wrap it in a shell alias:
# ~/.bashrc or ~/.zshrc
vnote() {
local dir="${1:-$HOME/repos/notes/voice}"
whisp -o "$dir" "${@:2}"
cd "$dir" && git add -A && git commit -m "voice: $(date +%Y-%m-%d_%H%M)" 2>/dev/null
}- Python 3.9+
- PortAudio (system library for mic access)
- ~140MB disk for the default model (up to 3GB for large models)
- Anthropic API key (optional, for AI enrichment)