-
Notifications
You must be signed in to change notification settings - Fork 0
TypeScript Audio Encoder Blueprint
Cameron Brooks edited this page May 30, 2025
·
1 revision
This guide shows how to replicate the Python librosa_encoder.py functionality using TypeScript and ffmpeg.
- Trim or loop audio to exactly 12 seconds
- Resample to 44.1kHz mono
- Normalize loudness via
ffmpeg'sloudnormfilter - Export as WAV
- Generate a simple mel-spectrogram array for frontend visuals
-
Preprocess with FFmpeg
- Convert to mono, resample, trim and normalize:
ffmpeg -y -i input.wav -ac 1 -ar 44100 -t 12 -filter:a loudnorm=I=-16:LRA=11:TP=-1 output.wav
-
Parse WAV
- Read the PCM samples from the processed file.
-
Compute Spectral Features
- Use a naive DFT to calculate magnitudes for each frame.
- Average the magnitudes into
nMelsbuckets to approximate a mel‑spectrogram.
backend/audio_encoder.ts implements this flow. Run it with Bun:
bun backend/audio_encoder.ts input.wav processed.wav metadata.jsonIt writes the normalized clip and a JSON metadata file containing duration and an averaged mel-spectrogram array.