Skip to content

bdfinst/whisp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

whisp

Voice-to-markdown transcription pipeline with AI enrichment. Start talking, get structured markdown notes with titles, keywords, and topic categorization.

How it works

  1. Captures microphone audio in real-time
  2. Transcribes locally using faster-whisper (CTranslate2 port of OpenAI Whisper — runs on CPU, no API keys)
  3. Writes markdown incrementally with automatic paragraph/section breaks based on speech pauses
  4. Post-processes via the Anthropic API to generate title, keywords, topic, and summary
  5. Adds YAML frontmatter and renames file with a descriptive slug
  6. Optionally indexes into a knowledge base directory for organization

Install

git clone git@github.com:bdfinst/whisp.git && cd whisp
chmod +x install.sh && ./install.sh

The installer will:

  • Verify Python 3.9+
  • Install system audio dependencies (PortAudio)
  • Create a Python virtual environment (.venv)
  • Install pip dependencies
  • Create a whisp symlink in /usr/local/bin

Or manually:

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
# Linux (Ubuntu/Debian): sudo apt-get install portaudio19-dev python3-dev
# Linux (Fedora/RHEL):   sudo dnf install portaudio-devel python3-devel
# macOS: brew install portaudio (if needed)

Environment

Create a .env file in the project root:

ANTHROPIC_API_KEY=sk-ant-...
HF_TOKEN=hf_your_token_here
WHISP_OUTPUT_DIR=~/notes/voice
WHISP_INDEXED_NOTES_DIR=/path/to/knowledge-base/source-material
Variable Purpose Default
ANTHROPIC_API_KEY Anthropic API key for AI enrichment (enrichment skipped if unset)
HF_TOKEN HuggingFace token for model downloads
WHISP_OUTPUT_DIR Directory to write voice notes ./notes
WHISP_INDEXED_NOTES_DIR Knowledge base directory for indexing (disabled if unset)

Prerequisites

  • Python 3.9+
  • PortAudio system library for microphone access
  • Anthropic API key (optional) — required for AI enrichment (title, keywords, topic extraction)

Usage

# Default: record, transcribe, enrich, and index
whisp

# Custom output directory
whisp -o ~/repos/my-notes/voice

# Better accuracy (slower, ~460MB model download on first run)
whisp --model small.en

# Custom title (skips AI-generated title)
whisp --title "Sprint Retro Notes"

# List available microphones
whisp --list-devices

# Use a specific mic (index from --list-devices)
whisp --device 3

# Disable inline timestamps
whisp --no-timestamps

# Skip AI enrichment (raw transcript only)
whisp --no-post-process

# Skip knowledge base indexing (record now, index later)
whisp --no-index

# Batch index: index all previously recorded notes at once
whisp --index                     # index enriched notes from default output dir
whisp --index ~/notes/voice       # index from a specific directory

All CLI options

Flag Description Default
-o, --output-dir Directory to write markdown files from config or ./notes
-m, --model Whisper model size base.en
-t, --title Custom markdown title auto-generated
--device Audio input device index system default
--list-devices List available audio input devices
--no-timestamps Disable inline timestamps in output false
--compute-type Model quantization (float32, float16, int8) int8
--no-post-process Skip Claude AI enrichment false
--no-index Skip copying to knowledge base false
--index [DIR] Batch-index enriched files from DIR (skips recording) output dir
--config Path to a specific whisp.yaml config file auto-detected

CLI arguments override config file values.

Configuration

Whisp loads configuration from whisp.yaml, searching these locations in order:

  1. Path passed via --config
  2. ./whisp.yaml (current working directory)
  3. ~/.config/whisp/whisp.yaml
  4. whisp.yaml in the project directory

Example whisp.yaml:

model: base.en
compute_type: int8
device: null
no_timestamps: false

post_process:
  enabled: true
  claude_model: sonnet

index:
  enabled: true

Local paths (output_dir, source_material_dir) are configured via environment variables in .env rather than the config file, so whisp.yaml can be committed without machine-specific paths.

Tunable audio parameters

These constants are defined in voice_to_md.py:

Constant Default Purpose
CHUNK_DURATION_S 5 Seconds of audio per transcription batch
SILENCE_THRESHOLD 0.01 RMS level below which audio is "silence"
PAUSE_BREAK_S 2.0 Silence duration that triggers a paragraph break
LONG_PAUSE_S 8.0 Silence duration that triggers a section break

Output format

Files are created as YYYY-MM-DD_HHMMSS.md (or YYYY-MM-DD_HHMMSS_title-slug.md after AI enrichment).

With post-processing enabled, files include YAML frontmatter:

---
title: Improving CI Pipeline Feedback Loops
date: 2026-02-06 14:30
type: voice-note
word_count: 482
duration: 5m 23s
keywords:
- continuous delivery
- feedback loops
- pipeline optimization
primary_topic: continuous-delivery
summary: Discussion of strategies to reduce CI pipeline feedback time.
---

# Voice Notes — 2026-02-06 14:30

> **Recorded**: Thursday, February 06, 2026 at 02:30 PM
> **Model**: base.en

---

First chunk of transcribed speech flows naturally here as continuous text.

Second paragraph appears automatically after a ~2 second pause in speech.

---

*[3m]*

A section break with timestamp appears after longer pauses (~8 seconds),
useful for topic changes.

---

> **Duration**: 5m 23s | **Words**: ~482

Models

Model Size Speed Accuracy Use case
tiny.en 75MB Fastest Lower Quick drafts
base.en 140MB Fast Good Default — best balance
small.en 460MB Medium Better Important meetings
medium.en 1.5GB Slower High When accuracy matters most
large-v3 3GB Slowest Highest Multi-language support

Models download automatically on first use from HuggingFace Hub. .en variants are English-optimized and faster.

Pipeline

The full whisp pipeline runs four steps:

  1. Record & Transcribe — Captures audio, transcribes with faster-whisper, writes markdown with smart pause detection
  2. AI Enrichment — Sends transcript to the Anthropic API to extract title, keywords, topic, and summary; adds YAML frontmatter; renames file with title slug
  3. Index — Copies the note to a knowledge base directory organized by topic category, and appends a row to voice-notes-index.md
  4. Finalize — Writes footer with duration and word count

Steps 2 and 3 can be disabled with --no-post-process and --no-index (or via config).

Topic categories

AI enrichment maps voice notes to these topic categories for indexing:

agile-practices, ai-future-development, continuous-delivery-ci, devops-sre, engineering-practices, general, metrics-measurement, security, team-organization-leadership, testing-quality

Git workflow

Point the output at a repo for version-controlled notes:

whisp -o ~/repos/notes/voice

# After a session:
cd ~/repos/notes
git add voice/ && git commit -m "Voice notes $(date +%Y-%m-%d)"

Or wrap it in a shell alias:

# ~/.bashrc or ~/.zshrc
vnote() {
    local dir="${1:-$HOME/repos/notes/voice}"
    whisp -o "$dir" "${@:2}"
    cd "$dir" && git add -A && git commit -m "voice: $(date +%Y-%m-%d_%H%M)" 2>/dev/null
}

Requirements

  • Python 3.9+
  • PortAudio (system library for mic access)
  • ~140MB disk for the default model (up to 3GB for large models)
  • Anthropic API key (optional, for AI enrichment)

About

Note taking voice to text

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published