whisp

Voice-to-markdown transcription pipeline with AI enrichment. Start talking, get structured markdown notes with titles, keywords, and topic categorization.

How it works

Captures microphone audio in real-time
Transcribes locally using faster-whisper (CTranslate2 port of OpenAI Whisper — runs on CPU, no API keys)
Writes markdown incrementally with automatic paragraph/section breaks based on speech pauses
Post-processes via the Anthropic API to generate title, keywords, topic, and summary
Adds YAML frontmatter and renames file with a descriptive slug
Optionally indexes into a knowledge base directory for organization

Install

git clone git@github.com:bdfinst/whisp.git && cd whisp
chmod +x install.sh && ./install.sh

The installer will:

Verify Python 3.9+
Install system audio dependencies (PortAudio)
Create a Python virtual environment (.venv)
Install pip dependencies
Create a whisp symlink in /usr/local/bin

Or manually:

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Linux (Ubuntu/Debian): sudo apt-get install portaudio19-dev python3-dev
# Linux (Fedora/RHEL):   sudo dnf install portaudio-devel python3-devel
# macOS: brew install portaudio (if needed)

Environment

Create a .env file in the project root:

ANTHROPIC_API_KEY=sk-ant-...
HF_TOKEN=hf_your_token_here
WHISP_OUTPUT_DIR=~/notes/voice
WHISP_INDEXED_NOTES_DIR=/path/to/knowledge-base/source-material

Variable	Purpose	Default
`ANTHROPIC_API_KEY`	Anthropic API key for AI enrichment	(enrichment skipped if unset)
`HF_TOKEN`	HuggingFace token for model downloads
`WHISP_OUTPUT_DIR`	Directory to write voice notes	`./notes`
`WHISP_INDEXED_NOTES_DIR`	Knowledge base directory for indexing	(disabled if unset)

Prerequisites

Python 3.9+
PortAudio system library for microphone access
Anthropic API key (optional) — required for AI enrichment (title, keywords, topic extraction)

Usage

# Default: record, transcribe, enrich, and index
whisp

# Custom output directory
whisp -o ~/repos/my-notes/voice

# Better accuracy (slower, ~460MB model download on first run)
whisp --model small.en

# Custom title (skips AI-generated title)
whisp --title "Sprint Retro Notes"

# List available microphones
whisp --list-devices

# Use a specific mic (index from --list-devices)
whisp --device 3

# Disable inline timestamps
whisp --no-timestamps

# Skip AI enrichment (raw transcript only)
whisp --no-post-process

# Skip knowledge base indexing (record now, index later)
whisp --no-index

# Batch index: index all previously recorded notes at once
whisp --index                     # index enriched notes from default output dir
whisp --index ~/notes/voice       # index from a specific directory

All CLI options

Flag	Description	Default
`-o, --output-dir`	Directory to write markdown files	from config or `./notes`
`-m, --model`	Whisper model size	`base.en`
`-t, --title`	Custom markdown title	auto-generated
`--device`	Audio input device index	system default
`--list-devices`	List available audio input devices
`--no-timestamps`	Disable inline timestamps in output	`false`
`--compute-type`	Model quantization (`float32`, `float16`, `int8`)	`int8`
`--no-post-process`	Skip Claude AI enrichment	`false`
`--no-index`	Skip copying to knowledge base	`false`
`--index [DIR]`	Batch-index enriched files from DIR (skips recording)	output dir
`--config`	Path to a specific `whisp.yaml` config file	auto-detected

CLI arguments override config file values.

Configuration

Whisp loads configuration from whisp.yaml, searching these locations in order:

Path passed via --config
./whisp.yaml (current working directory)
~/.config/whisp/whisp.yaml
whisp.yaml in the project directory

Example whisp.yaml:

model: base.en
compute_type: int8
device: null
no_timestamps: false

post_process:
  enabled: true
  claude_model: sonnet

index:
  enabled: true

Local paths (output_dir, source_material_dir) are configured via environment variables in .env rather than the config file, so whisp.yaml can be committed without machine-specific paths.

Tunable audio parameters

These constants are defined in voice_to_md.py:

Constant	Default	Purpose
`CHUNK_DURATION_S`	5	Seconds of audio per transcription batch
`SILENCE_THRESHOLD`	0.01	RMS level below which audio is "silence"
`PAUSE_BREAK_S`	2.0	Silence duration that triggers a paragraph break
`LONG_PAUSE_S`	8.0	Silence duration that triggers a section break

Output format

Files are created as YYYY-MM-DD_HHMMSS.md (or YYYY-MM-DD_HHMMSS_title-slug.md after AI enrichment).

With post-processing enabled, files include YAML frontmatter:

---
title: Improving CI Pipeline Feedback Loops
date: 2026-02-06 14:30
type: voice-note
word_count: 482
duration: 5m 23s
keywords:
- continuous delivery
- feedback loops
- pipeline optimization
primary_topic: continuous-delivery
summary: Discussion of strategies to reduce CI pipeline feedback time.
---

# Voice Notes — 2026-02-06 14:30

> **Recorded**: Thursday, February 06, 2026 at 02:30 PM
> **Model**: base.en

---

First chunk of transcribed speech flows naturally here as continuous text.

Second paragraph appears automatically after a ~2 second pause in speech.

---

*[3m]*

A section break with timestamp appears after longer pauses (~8 seconds),
useful for topic changes.

---

> **Duration**: 5m 23s | **Words**: ~482

Models

Model	Size	Speed	Accuracy	Use case
`tiny.en`	75MB	Fastest	Lower	Quick drafts
`base.en`	140MB	Fast	Good	Default — best balance
`small.en`	460MB	Medium	Better	Important meetings
`medium.en`	1.5GB	Slower	High	When accuracy matters most
`large-v3`	3GB	Slowest	Highest	Multi-language support

Models download automatically on first use from HuggingFace Hub. .en variants are English-optimized and faster.

Pipeline

The full whisp pipeline runs four steps:

Record & Transcribe — Captures audio, transcribes with faster-whisper, writes markdown with smart pause detection
AI Enrichment — Sends transcript to the Anthropic API to extract title, keywords, topic, and summary; adds YAML frontmatter; renames file with title slug
Index — Copies the note to a knowledge base directory organized by topic category, and appends a row to voice-notes-index.md
Finalize — Writes footer with duration and word count

Steps 2 and 3 can be disabled with --no-post-process and --no-index (or via config).

Topic categories

AI enrichment maps voice notes to these topic categories for indexing:

agile-practices, ai-future-development, continuous-delivery-ci, devops-sre, engineering-practices, general, metrics-measurement, security, team-organization-leadership, testing-quality

Git workflow

Point the output at a repo for version-controlled notes:

whisp -o ~/repos/notes/voice

# After a session:
cd ~/repos/notes
git add voice/ && git commit -m "Voice notes $(date +%Y-%m-%d)"

Or wrap it in a shell alias:

# ~/.bashrc or ~/.zshrc
vnote() {
    local dir="${1:-$HOME/repos/notes/voice}"
    whisp -o "$dir" "${@:2}"
    cd "$dir" && git add -A && git commit -m "voice: $(date +%Y-%m-%d_%H%M)" 2>/dev/null
}

Requirements

Python 3.9+
PortAudio (system library for mic access)
~140MB disk for the default model (up to 3GB for large models)
Anthropic API key (optional, for AI enrichment)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
install.sh		install.sh
requirements.txt		requirements.txt
voice_to_md.py		voice_to_md.py
whisp		whisp
whisp.py		whisp.py
whisp.yaml		whisp.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

whisp

How it works

Install

Environment

Prerequisites

Usage

All CLI options

Configuration

Tunable audio parameters

Output format

Models

Pipeline

Topic categories

Git workflow

Requirements

About

Uh oh!

Releases

Packages

Languages

bdfinst/whisp

Folders and files

Latest commit

History

Repository files navigation

whisp

How it works

Install

Environment

Prerequisites

Usage

All CLI options

Configuration

Tunable audio parameters

Output format

Models

Pipeline

Topic categories

Git workflow

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages