Record system audio and automatically transcribe to text using ✨AI✨.
sys2txt is a command-line tool that records your system audio (via PulseAudio/PipeWire monitor sources) with ffmpeg and transcribes it locally using Whisper. It supports both:
- On-demand: Record until you stop, then transcribe once
- Live-ish: Segment the recording every N seconds and transcribe each segment as it’s created (prints continuously)
You can use either the openai-whisper (Python) reference implementation or the faster-whisper engine if installed. The tool auto-selects faster-whisper when available for better speed on CPU and especially GPU.
- Ubuntu with PulseAudio or PipeWire (default on modern Ubuntu)
- ffmpeg
- Python 3.9+ (recommended)
- System packages
sudo apt update
sudo apt install -y ffmpeg python3-venv python3-pip- Create a virtual environment and install sys2txt
cd sys2txt
python3 -m venv .venv
source .venv/bin/activate
pip install sys2txtThis installs both faster-whisper (for speed) and openai-whisper (reference implementation). The tool auto-selects faster-whisper when available or falls back to openai-whisper.
Record and transcribe once (press Ctrl-C to stop recording):
sys2txt once --model small.enLive segmented transcription (prints ongoing transcript every 8s by default; Ctrl-C to stop):
sys2txt live --model small.en --segment-seconds 8--source <pulse_source_name>- Explicit PulseAudio/PipeWire source (e.g., alsa_output.pci-0000_00_1f.3.analog-stereo.monitor)--list-sources- List available Pulse sources and exit--model <size>- tiny|base|small|medium|large-v2 (default: small)--engine <auto|faster|whisper>- Force a specific engine (default: auto)--language <code>- Force language code (e.g., en). Omit to auto-detect--output <path>- Write final transcript to a file (in live mode, appends)--duration <seconds>- (once mode) Record fixed duration instead of waiting for Ctrl-C--segment-seconds <n>- (live mode) Segment length in seconds (default: 8)--timestamps- Print timestamps alongside text
Record 30s of system audio from the default monitor and transcribe:
sys2txt once --duration 30 --model small --output transcript.txtUse a specific PulseAudio source:
sys2txt once --source alsa_output.usb-Focusrite_Scarlett.monitor --model baseLive mode with shorter latency and timestamps:
sys2txt live --segment-seconds 5 --timestampsForce the reference openai-whisper engine:
sys2txt once --engine whisper --model baseTranscribe an existing audio file:
sys2txt once --input recording.wav --model smallFind the default sink and its monitor source:
pactl get-default-sink
pactl list short sources | grep monitorRecord 30s of system audio from the default monitor to a WAV at 16 kHz mono (good for Whisper):
ffmpeg -hide_banner -loglevel error -f pulse -i "$(pactl get-default-sink).monitor" -ac 1 -ar 16000 -t 30 out.wavTranscribe with openai-whisper CLI:
whisper out.wav --model small --task transcribe --language en- If you get silence, ensure you are using the monitor source for your output device (the name ends with
.monitor). Use--list-sourcesto view options. - Make sure the application you want to capture is playing through the same output sink as your default sink. You can manage routes with
pavucontrol. - PipeWire systems expose PulseAudio-compatible sources, so
-f pulsein ffmpeg still works. - For better performance on CPU, use faster-whisper with model
baseorsmall. For the best accuracy, usemediumorlarge-v2(these are heavier). - GPU acceleration for faster-whisper requires a compatible ctranslate2 CUDA wheel. Set
SYS2TXT_DEVICE=cudato enable it. If not available, it will run on CPU.
Contributions are welcome! Please see CONTRIBUTING.md for:
- Development setup and workflow
- Running tests and code quality checks
- Release process and CI/CD workflows
- Pull request guidelines
For security issues, please see SECURITY.md.