A real-time audio spectrogram and DSP workbench that runs entirely in the
browser. Drop in an .mp3 or .wav and explore it through a dozen
time–frequency representations, a five-band parametric EQ, a spectral
"paint brush" editor with ISTFT resynthesis, harmonic–percussive separation,
loudness/true-peak metering, and automatic anomaly scanning — all with a
synchronized playhead and no build step.
Zero dependencies. Zero telemetry. Zero server. Open
index.htmlbehind any static HTTP server and you have a miniature DSP lab in your browser.
![]() Scrolling spectrogram synchronized to playback |
![]() 11 time-frequency views switch live |
![]() Spectral brush + ISTFT paint to edit audio |
![]() LUFS, true peak, key, BPM, anomaly scan |
- Run it & Quick tour
- Screenshot gallery
- Features at a glance
- Pro features
- Keyboard shortcuts
- Architecture
- Action graph — what happens at each step
- Expected behaviour per feature
- Known limitations
- Tests
- Contributing
- License
There is no build step — but the analysis runs inside a classic Web
Worker that uses importScripts, which browsers refuse to load over
file://. Serve the folder with any static HTTP server:
# Clone
git clone https://github.com/victordov/music-spectrogram.git
cd music-spectrogram
# Serve (pick one)
npm start # alias for python3 -m http.server 5173
python3 -m http.server 5173 # stdlib; no deps
npx http-server -p 5173 # Node, no install
php -S localhost:5173 # also worksThen open http://localhost:5173 in a recent Chrome 90+, Firefox 90+, or Safari 15+. Drag a file onto the page, or click Play demo for a zero-click tour.
① Start from the empty state. Drop an .mp3/.wav onto the page, click
Open audio file…, or hit Play demo track for a 12-second synthetic
reference (tones, arpeggios, broadband crackle, ultrasonic pilot — enough to
exercise every view and detector).
② Press Space to play. The waveform overview, static spectrogram, live
frequency bars, and playhead cursor all stay synchronized with the audio
clock (sample-accurate — playback time is derived from
AudioContext.currentTime, not rAF).
③ Switch visualisation live. Use the top toolbar to jump between Spectrogram / Mel / Chroma / Scalogram / Cochleagram / Reassigned / Cepstrum / Waterfall 3D / Features / Harmonic / Percussive — no reload, caches are reused.
④ Edit audio by painting. Enable Brush, drag across the spectrogram to attenuate (red), amplify (green), or smooth (cyan), then hit Apply to resynthesise via ISTFT — the edited audio replaces the playing buffer.
⑤ Shape the sound with the 5-band EQ. Flip on the cyan response curve to see what the filter chain is doing to your spectrum in real time.
⑥ Run the Pro analysis pass. One click produces integrated LUFS, true-peak dBTP, stereo correlation, detected key, tempo, and a nine-target compliance report (Spotify, Apple Music, YouTube, Tidal, EBU R128, ATSC A/85, ACX, …).
⑦ Mark regions and paint exact boxes. Shift-drag on the spectrogram to define a time/frequency rectangle, push it into the mask with Paint box, or store it as an annotation marker/region. Annotation state is saved per track in the browser and can be exported/imported as JSON.
⑧ Hunt anomalies automatically. The scanner reports ultrasonic content, sustained pilot tones, broadband bursts, LSB steganalysis, and geometric hotspots. Press 🎯 Play hunt to auto-seek to each hotspot in turn.
Hover the spectrogram any time to see the time, frequency, nearest musical note (with cents deviation), and dB at the cursor:
See §6 Keyboard shortcuts for the full hotkey list.
A focused tour lives in §2 Quick tour above. The full set of
reference screenshots is in docs/screenshots/. The
grids below show the alternative visualisation modes side-by-side so you can
compare at a glance.
| Scrolling spectrogram | Static full-track | Waterfall / 3D |
|---|---|---|
![]() |
![]() |
![]() |
| Mel spectrogram | Chromagram | Wavelet scalogram |
|---|---|---|
![]() |
![]() |
![]() |
| Cochleagram | Spectral features | Hover readout |
|---|---|---|
![]() |
![]() |
![]() |
Drag-and-drop or file-picker import (.mp3, .wav; also m4a/ogg/flac if the
browser can decode them), play/pause/resume/stop, accurate seek, scrub,
waveform overview with click-to-seek, timeline with current/total duration,
volume, playback-rate (0.25×–2×), and A–B loop.
Scrolling spectrogram, static full-track spectrogram with moving cursor, waterfall/pseudo-3D spectrogram, mel spectrogram, chromagram (12 pitch classes), complex-Morlet wavelet scalogram, gammatone cochleagram, reassigned-style sharpened spectrogram, cepstrum (quefrency) plot, spectral feature panel (centroid / bandwidth / rolloff / flux / RMS), live frequency bars, HPS-harmonic-only view, HPS-percussive-only view.
FFT size (256 → 8192), overlap (25/50/75/87.5 %), window function (Hann, Hamming, Blackman, Bartlett, Rectangular, Flat-Top), smoothing, min/max displayed frequency, channel mode (mixed/L/R/stereo), linear/log freq axis, eight colour maps, min/max dB, gamma, time and frequency zoom/pan.
The min/max dB controls remap the spectrogram and every dB-backed alternative heatmap (mel / cochleagram / reassigned / cepstrum / scalogram) through the current display window. Views whose vertical axis is not frequency (chroma, cepstrum, features) do not draw Hz-labelled gridlines on top.
The min / max displayed frequency controls are also a hard band-limit on
the audio path: a 4th-order Linkwitz-Riley filter (cascaded highpass +
lowpass biquads at Q = 0.707, 24 dB/oct) sits between the gain stage and the
EQ chain, so frequencies outside the window are silenced in playback as well
as on screen. The max ceiling is set to the file's Nyquist frequency on
load — so a 96 kHz file exposes up to 48 kHz, and the ultrasonic bands stay
both visible and audible if your output device can reach them.
Five-band parametric EQ (highpass, lowpass, bandpass, notch, peaking, low-shelf, high-shelf) with real-time cookbook biquads and an on-screen response curve. Spectral "brush" with five modes (attenuate, amplify, erase, smooth, preserve-peaks), optional harmonic lock, optional auto-select flood-fill, and offline ISTFT-based resynthesis. Harmonic-percussive separation (HPS) via median-filter Wiener masks — isolate either component and play it back.
The Precise Spectral Selection panel complements the freehand brush with an exact rectangular edit workflow: enter start/end/low/high numerically, pull the current loop or viewport into the box, then apply that rectangle to the spectral mask using the current brush mode. Shift-drag on the spectrogram creates the same box directly from the canvas.
The Annotations panel stores time markers and time/frequency regions on
top of the current track. Markers draw as vertical guides, regions draw as
boxed overlays, and both can be recalled from the side list with Seek.
Annotations persist per track in localStorage and can be shared as JSON.
Four presets (Music, Speech, Hi-fi detail, Transient), PNG export of the current view, global "Reset" that restores every control to defaults and reverts edited audio, dark theme, responsive layout, fullscreen, keyboard shortcuts, decode-failure toast.
An empty-state overlay greets first-time visitors with two big CTAs — Play demo track and Open audio file — plus role chips (Musicians / Researchers / Visual / Instant) so the tool's value is legible in under three seconds. The Play demo button synthesises a 12-second reference track on the fly (Cmaj7 arpeggio at 120 BPM, 15 kHz sustained pilot, persistent 21 kHz ultrasonic tone, rhythmic broadband crackle) — ideal for testing every visualisation and anomaly detector without hunting for a suitable file.
A cursor-following readout (time / frequency / nearest musical note with cents deviation / dB at cursor) floats near the pointer whenever you hover the spectrogram. It reads amplitude straight from the pre-computed STFT magnitude grid, so a hover in the static view tells you the level at that point in the track, not "now".
Clip recorder (top-right ● Rec clip button) captures the canvas
via captureStream(30 fps) and taps the same audio graph the speakers
hear via a MediaStreamAudioDestinationNode. The result is a
VP9/VP8-in-WebM file that's directly shareable to Instagram, TikTok, or
Discord — a one-click path from analysis screen to social post. A
pulsing red indicator appears on the spectrogram while recording is
active.
Hunt mode (anomaly panel) turns a scan into a cinematic playlist: once the detector has identified sustained tones, broadband bursts, the ultrasonic peak time, or spectrogram geometry hotspots, pressing 🎯 Play hunt auto-seeks to each in turn, narrows the display band to ±⅓ octave around the hotspot frequency, and dwells for 2–3 seconds before advancing. A short-attention-span auditor can triage a 5-minute track in under a minute.
The right-hand panel adds three "Pro" sections that turn Spectrogram Lab into a small mastering/broadcast QC tool. Everything is 100 % client-side — no server, no ML models, no licences to manage.
Click "Analyze track" in the Pro Analysis card to run a full offline pass over the decoded buffer. Four cards populate:
Loudness
- Integrated LUFS — ITU-R BS.1770-4 K-weighted with the −70 LU absolute and −10 LU relative gates (EBU R128 Annex 2).
- Momentary max LUFS — peak of the 400 ms sliding-window loudness.
- Short-term max LUFS — peak of the 3 s sliding-window loudness.
- LRA (LU) — loudness range, 10–95 percentile of the relative-gated 3 s windows.
Peaks & Stereo
- True peak dBTP — 4× polyphase oversampling via a 48-tap FIR. Catches inter-sample peaks that are invisible to a plain sample peak.
- Sample peak dBFS — raw
|x|maximum. - Stereo correlation — Pearson correlation of L and R.
+1= mono-safe,0= wide/independent,−1= phase-inverted (problematic). - Mid/side ratio — ‖M‖/‖S‖, gives a rough sense of stereo width.
- Stereo delay — best short-lag cross-correlation offset between L and R, useful for spotting skewed transfers or widened masters.
- Stereo balance — RMS level delta between left and right channels.
Music
- Key — Krumhansl-Kessler profile correlation over a 24-way template
bank (12 major + 12 minor). Reports the best match and a
0…1confidence. Below 0.35 it prints "Unclear". - Tempo — half-wave-rectified spectral-flux onset envelope, autocorrelated in the 40–240 BPM range with an octave-correction kernel centred on 100 BPM. Below 0.35 confidence it prints "Variable".
- Noise floor — 10th-percentile short-term LUFS (useful for dynamic-range complaints).
Live meters
Running approximations that update ~25×/s during playback:
Momentary LUFS, Short-term LUFS, Live true peak dBTP, per-channel live
peaks, and live stereo correlation. These are
approximate because they read AnalyserNode.getFloatTimeDomainData which
returns the most recent 2048 samples (with overlap between ticks). For
certification-grade numbers, use the offline "Analyze track" values.
The dedicated Stereo Diagnostics card renders a compact goniometer from a live stereo window around the playhead and summarises the current image as:
- Mode — near-mono / stereo / wide stereo / out of phase
- Imbalance — instantaneous L/R RMS delta
- Mono-safe — quick warning when correlation drops into risky territory
Because offline spectral edits and HPS renders now rebuild the audio buffer with all original channels intact, these diagnostics remain valid after an edit instead of silently degrading to mono.
Choose a delivery target from the dropdown and the panel evaluates every check with a pass/fail badge:
| Target | Integrated LUFS | True peak | Other |
|---|---|---|---|
| Spotify (loud) | −14 ±1 LUFS | ≤ −1 dBTP | — |
| Apple Music | −16 ±1 LUFS | ≤ −1 dBTP | — |
| YouTube | −14 ±1 LUFS | ≤ −1 dBTP | — |
| Amazon Music HD | −14 ±1 LUFS | ≤ −2 dBTP | — |
| Tidal | −14 ±1 LUFS | ≤ −1 dBTP | — |
| EBU R128 broadcast | −23 ±1 LUFS | ≤ −1 dBTP | LRA ≤ 20 LU |
| ATSC A/85 (US TV) | −24 ±2 LUFS | ≤ −2 dBTP | — |
| ACX audiobook | −19 … −23 LUFS | ≤ −3 dBTP | noise floor ≤ −60 dB |
| Podcast (Apple/Spotify) | −16 ±1 LUFS | ≤ −1 dBTP | — |
Export compliance report produces a dark-themed printable HTML file with the full measurement table and per-check badges. Save or print-to-PDF and send it with your delivery.
Five presets that build a gain mask matching the current STFT grid and merge
it into the live SpectralMask. They don't edit audio on their own —
click Brush → Apply to resynthesise via the ISTFT worker, exactly like a
hand-painted edit.
- Hum remover (50 / 60 Hz) — auto-detects mains frequency by comparing
band energy at 47–53 Hz vs 57–63 Hz, then notches the fundamental plus
the first six harmonics (bandwidth ±3 Hz, floor 0.1). Pick
Auto / 50 Hz / 60 Hzfrom the dropdown to override detection. - Low-rumble remover — raised-cosine high-pass mask at the configured cutoff (default 40 Hz, 20 Hz transition width). Kills HVAC drone without touching the bass fundamental.
- Hiss reducer — Wiener-style subtraction estimated from the 10 % quietest frames above 1 kHz. Floor 0.2 so it never fully mutes a bin.
- De-esser (4–8 kHz) — sinusoidal kernel in the sibilance band, attenuates frames where 4–8 kHz energy spikes above the running median.
- Declick — z-score spectral-flux detector (>2.5 σ), attenuates the detected frame ±1 neighbours above 3 kHz.
All presets compose by multiplying gains. Running Hum then Hiss gives you a clean spectrum, then click Brush → Apply once to render.
Click "Run anomaly scan" in the Anomaly scan card and the app sweeps the decoded buffer plus the full-track STFT through five detectors, then reports a single Suspicion score (0 …1) with a verdict chip (Clean / Minor / Suspect) and a list of notes.
Ultrasonic energy (≥ 18 kHz) Reports the fraction of total spectral energy that sits above 18 kHz, the peak ultrasonic frequency, and its level in dB. The threshold triggers at ≥ 1 % of total energy, or any bin ≥ −60 dB above 18 kHz. Normal mastered music sits at well under 0.1 %; persistent ultrasonic tones or broadband haze in this band is a classic carrier for side-channel data.
Sustained tones (pilot-tone / carrier detector) Per-frame median is subtracted and any bin ≥ 18 dB above the median that also beats its frequency neighbours is marked. Runs of ≥ 0.8 s (configurable) in the same bin are merged into tones — so a vocal held note doesn't trip the detector, but a 60 Hz hum, a deliberate pilot tone, or a hidden carrier embedded in an otherwise clean frequency band will.
Broadband bursts Per-frame wideband energy is z-scored against the whole track. Clusters with z ≥ 4 σ (default) are reported with their time, magnitude, and peak frequency. These correspond to clicks, edits, impulsive encoder glitches, or short-burst stego payloads.
LSB steganalysis Westfeld-Pfitzmann pairs-of-values chi-square plus lag-1 LSB autocorrelation, computed on the raw 16-bit-quantised samples. Flags suspiciously equal PoV-pair counts (z ≤ −3 σ) — the fingerprint of LSB replacement on a container whose LSBs originally had structure. The verdict string also reports the inconclusive case ("low-structure signal") honestly; dithered 24-bit masters and many compressed formats naturally have random LSBs and the test can't say either way.
Geometric structure Sobel-gradient edge density + orientation entropy over the packed spectrogram grid. A normal recording is a texture with near-uniform gradient directions (entropy ≈ 3.0). Injected shapes (DOOM-pentagram-style hidden imagery, text, barcodes) show up as a handful of dominant orientations — edge density goes up and orientation entropy drops. The top-10 "hotspot" tiles are returned with time and frequency coordinates so you can click Seek to jump straight to them and inspect the overlay.
Summary score is a weighted aggregate of the individual flags. Anything ≥ 0.6 is treated as Suspect and the chip turns red; 0.3 …0.6 is Minor; below 0.3 is Clean.
Quick actions
- Show ultrasonic only — sets the frequency window to
[18 kHz, Nyquist]so both the audio band-limit filter and the visualiser isolate the ultrasonic region. You can play the file and hear just what's hiding up there (if your output device can reproduce it). - Reset range — restores
[0, 20 kHz](or Nyquist, whichever is lower).
Anomaly scanning is only as good as the bit-depth and sample-rate surviving on your end of the chain. To give the detector the best chance:
- Lossless audio:
.wav(ideally 16/24-bit PCM) or.flac. MP3/AAC/OGG lossy compression destroys LSB structure and also cuts almost everything above 16 kHz, so the ultrasonic and LSB detectors will read "inconclusive" on those formats. - Sample rate ≥ 44.1 kHz, ideally 48 or 96 kHz. A 96 kHz file lets the scanner see up to 48 kHz. Anything above a file's Nyquist is physically absent.
- Unedited source: avoid pre-processing (normalisation, EQ, re-encoding) before the scan — each of these smooths LSB statistics and may smear embedded tones or shapes. Run the scan first, then edit.
- Length: the PoV chi-square needs at least ~10 distinct sample values with ≥ 5 samples each. A 5-second clip at 44.1 kHz is plenty; very short clips may return "inconclusive".
You do not need to provide any shape, template, or sample of what you're looking for — the detector characterises the signal statistically, so it flags any anomaly that stands out from the natural background.
js/metering.js— K-weighting filters, RunningLufs, true-peak, LRA, correlation, mid/side, noise floor.js/key-tempo.js— Krumhansl-Kessler key templates, onset envelope, autocorrelation BPM detector.js/restoration.js— mask builders that all return the same{data, smooth, description}shape as the Brush mask.js/compliance.js— target table, per-check evaluation, HTML renderer.js/anomaly.js— ultrasonic-band statistics, sustained-tone detector, broadband-burst z-score detector, Sobel-gradient geometry analyser, Westfeld-Pfitzmann PoV chi-square + lag-1 LSB autocorrelation, weighted-score aggregator.
Each module is a plain <script> that exposes a global namespace
(Metering, KeyTempo, Restoration, Compliance, Anomaly), so nothing
in the existing architecture changes — the Pro panel just reads from the
same state.buffer / state.track.rawMags / state.track.grid arrays the
core app already maintains.
The codebase has a single gate — state.tier — that is permanently set to
'studio' in this open-source build, so every Pro feature is available. If
you fork and want to add subscription gating, guard the three Pro UI
sections (#proRun, compliance panel, restoration buttons) and return early
from runProAnalysis / applyRestorationPreset when the tier is lower than
required. All heavy lifting is DSP code that runs in the browser, so no
backend changes are needed.
verify_pro.js (run with node outputs/verify_pro.js) ships 34 tests
across metering, key/BPM, restoration, compliance, and anomaly detection —
including ITU-R BS.1770 K-weighting shape, integrated LUFS of reference
sines, oversampled true-peak, stereo correlation, A-minor-chord key
detection, 120 BPM click-train tempo, hum notch shape, pass/fail evaluation
of a synthetic report against all nine compliance targets, ultrasonic
detection on a 22 kHz tone in a 48 kHz buffer, sustained-tone detection on
a pilot tone, broadband-burst detection on an injected click,
pathologically-balanced-PoV chi-square detection of LSB steganography, and
an end-to-end scan that combines them all.
| Key | Action |
|---|---|
| Space | Play / pause |
| S | Stop |
| ← / → | Seek −/+ 5 s |
| Shift + ← / → | Seek −/+ 1 s |
| ↑ / ↓ | Volume |
| + / − | Frequency zoom |
| [ / ] | Time zoom |
| M | Cycle color map |
| L | Toggle log / linear frequency |
| F | Fullscreen |
| A / B | Set loop start / end |
| Esc | Clear loop |
The app is plain ES2017 — no bundler, no framework. Modules communicate
through window globals and one long-lived worker.
index.html
styles.css
js/
colormaps.js 256-entry RGB LUTs for 8 colour maps
fft.js Radix-2 in-place Cooley–Tukey FFT + window builders
audio-engine.js Decode, BufferSource playback clock, EQ chain,
AnalyserNode, samplesToBuffer helper
analyzer.js STFT/ISTFT, complex-Morlet CWT, mel/gammatone/chroma
filterbanks, cepstrum, HPS masks, harmonic tracking
waveform.js Peak extraction + overview canvas with click-to-seek
renderer.js Main spectrogram renderer (scroll + static), log/
linear axes, dB re-scaling, overlay (grid, cursor,
loop region, harmonic trace), overlay-post hook
visualizations.js Mel / chroma / cochleagram / reassigned / cepstrum /
features renderers; Waterfall 3D class
eq.js ParametricEQ class + cookbook biquad coefficients +
response-curve drawing
spectral-edit.js SpectralMask: paint / harmonic-lock paint /
auto-select flood fill / findPeaks / overlay tint
app.js Controller — state, UI wiring, keyboard, drag-drop,
presets, export, reset, brush dispatch, HPS, EQ UI
workers/
analysis-worker.js Off-thread full-track STFT, scalogram, HPS, and
ISTFT resynthesis with mask + smoothing
-
Audio decoding and playback —
audio-engine.jsowns the singleAudioContext. The chain isBufferSource → GainNode → eqInput → [BiquadFilters…] → eqOutput → AnalyserNode → destination. Play position is derived fromAudioContext.currentTime - startedAtso the playhead is frame-accurate regardless of browser load. -
Signal processing —
analyzer.js+fft.jsare pure and DOM-free. The same code runs on the UI thread (per-frame path for scrolling spectrogram and live bars) and insideanalysis-worker.js(full-track, scalogram, HPS, ISTFT). The worker transfersFloat32Arraybuffers withTransferableto avoid copying. -
Rendering — Canvas 2D. The scrolling spectrogram uses an offscreen back buffer shifted left by one device pixel per hop (O(1) blit via
drawImage). The static view renders a fullImageDatablit. All overlays (grid, cursor, loop, harmonic tracker, EQ curve) live on a second canvas stacked on top, so cursor updates don't redraw the expensive spectrogram. -
UI state —
app.jsholds the single source of truth instate. Alt visualisations are lazily computed and cached onstate.altGrids; caches invalidate when FFT size, overlap, window, or channel mode changes.
-
Scrolling spectrogram and waterfall: a per-frame FFT reads the decoded
AudioBufferatfloor(audio.getCurrentTime() * sampleRate) - fftSize/2and pushes one column whenevercurrentTimehas advanced by ≥ one hop. This is a direct function of the audio clock — it cannot drift. -
Static spectrogram, mel, chroma, cochleagram, reassigned, cepstrum, features, scalogram, HPS views: precomputed grids; the playhead cursor is drawn on the overlay canvas each animation frame from the audio clock.
-
Live bars: pulled directly from
AnalyserNode.getByteFrequencyData, which is already time-aligned with the output device.
Full-track STFT, scalogram, and HPS run in the background worker. Results
come back as packed Uint8Array grids (one byte per bin, normalised to a
wide dB range), so even a 20-minute track fits comfortably in memory and
every view becomes a simple byte lookup. Raw magnitudes (Float32Array
arrays) are kept only while they're needed to derive alt grids, then dropped.
channelMode (mono / left / right / stereo-split) only affects the analysis
mix; the player always uses the current multichannel AudioBuffer.
Switching modes triggers a fresh worker analysis but does not touch playback.
Offline spectral edits, HPS isolation, and revert all preserve channel count,
so stereo metering and diagnostics continue to reflect the actual rendered
buffer.
Every arrow below shows: user action → modules touched → expected state at the end of the step. Conditions that should hold after the step are in italics.
drag-drop / Open → app.handleFile()
↓
audio.loadFile() — decode via AudioContext.decodeAudioData
↓
applyChannelMode() — mix/split into state.monoSamples
↓
waveform.computePeaks() — min/max per ~2000 bins
waveform.render() — draw peaks + centre line
↓
renderer.duration,
renderer.sampleRate — set
applyDisplayParams() — push current UI values to renderer
↓
computeTrack() — spawn worker, POST { cmd:'stft' }
↓
[worker] computeSpectrogram + raw magnitudes
↓
onmessage 'stft-done':
state.track = { grid, nFrames, nBins, fftSize,
hop, sampleRate, minDb, maxDb, rawMags }
state.mask = new SpectralMask(nFrames, fftSize/2 + 1)
state.altGrids = {} — invalidate caches
renderer.setTrack()
renderCurrentMode()
Expected after step A:
- Filename + duration + channel count are shown.
- Waveform overview is populated.
- Status bar says "Ready. Press Space to play."
- The currently selected visualisation tab shows data; the scrolling spectrogram shows a blank black field until you press Play (by design — scroll mode is fed by the live FFT).
togglePlay() → audio.play() — creates BufferSource, starts at offset
tick() (rAF loop):
├── audio.getCurrentTime() — elapsed since startedAt (× rate)
├── seekBar.value ← t / duration
├── timeLabel ← formatTime(t) / duration
├── waveform.setTime(t) — redraws waveform cursor
├── renderer.setCursor(t) — clears overlay, redraws grid +
│ cursor + harmonic trace + EQ curve
├── drawBars() — live frequency bars
├── maybePushRtFrame() — one new FFT column per hop
└── loop check → audio.seek(loop.start) — if t ≥ loop.end
Expected after step B:
- Playback is audible.
- Waveform cursor and spectrogram cursor move smoothly together.
- Live frequency bars react to the current audio.
- If in scrolling-spectrogram mode, new columns appear at the right edge.
input source → audio.seek(t)
↓
stop current BufferSource silently
offset ← t
if wasPlaying: audio.play(t) — new BufferSource from t
emitState
↓
on next tick: cursor + seekBar + labels update
Expected after step C:
- Audio resumes from the new position without clicks.
- Seek is sample-accurate — click the waveform exactly where you want and the cursor jumps there.
change event → state.{fftSize|overlap|windowFn|channelMode} ← new value
↓ if channel mode changed:
↓ applyChannelMode() + waveform.computePeaks()
↓
computeTrack() — re-run worker with new params
↓
state.altGrids = {} — invalidate mel/chroma/... caches
state.scalogram = null
renderer.setTrack()
renderCurrentMode()
Expected after step D:
- Status bar shows "Analyzing…" with a progress percentage.
- All time-frequency views (not just spectrogram) regenerate from the new grid on next tab switch.
- Playback continues uninterrupted during analysis.
input event → state.* ← new value
↓
applyDisplayParams() — pushes to renderer, redraws legend
↓
renderCurrentMode() — fully repaints the active view
Expected after step E:
- Changes are visible within one animation frame (< 16 ms).
- No audio worker round-trip is needed — this is all re-rendering.
- Switching between Log and Linear preserves the current dB window.
click tab → state.mode ← 'mel' | 'chroma' | 'scalogram' | ...
↓
applyDisplayParams()
↓
renderCurrentMode() dispatch:
mel / chroma / cochlea / reassigned / cepstrum
→ build from state.track.rawMags on first use, cache on
state.altGrids, draw with drawHeatGrid
scalogram
→ spawn worker { cmd:'scalogram' }; drawScalogram() on reply
waterfall
→ ignite 3D draw using Visualizations.Waterfall
features
→ buildFeatures from rawMags + monoSamples, drawFeatures
harmonic / percussive
→ require prior "Analyze HPS"; otherwise status warns
Expected after step F:
- First time a tab is opened it takes up to a few hundred ms to build the alt grid; subsequent opens are instant (cached).
- The scalogram spawns its own worker (separate from the main STFT) and shows a progress percentage.
build → buildEqUi() creates 5 rows; each row binds to state.eq.bands[i]
change → band.{enabled|type|freq|q|gain} updated
↓
applyEq() → audio.setEqBands(bands, bypass)
↓
disconnect old BiquadFilter chain
build new chain in enabled-band order
connect eqInput → band0 → band1 → … → eqOutput
↓
drawEqCurveOverlay() → renderer.renderOverlay()
→ onOverlayDraw hook redraws curve on top
Expected after step G:
- Audio changes instantly (no recompute).
- The cyan response curve on the spectrogram matches the audible change.
- The curve persists across cursor updates during playback (this used to flicker — now fixed via the post-overlay hook).
- Bypass flips the entire chain to a straight-through path.
- "Show curve" off removes only the visual; audio is unchanged.
Brush ON → state.brushActive = true
→ auto-switch to Static mode so strokes are actually visible
mouse down/move on spectrogram while button held:
paintBrushAt(xPx, yPx)
compute (frame, bin) from pixel coords
using renderer.yToFreq → hz/(sr/2) * mask.nBins (maps to
the one-sided FFT bin, including the Nyquist bin)
if harmonic-lock: paintHarmonicLock(), else paint()
renderCurrentMode() → drawMaskOverlay()
tints the canvas red (attenuate) / green (amplify) / cyan (smooth)
Auto-select + dblclick:
autoSelectAt(xPx, yPx) → mask.autoSelect(grid, seed, 12 dB threshold)
→ flood-fills a connected region of similar
brightness and paints it in one shot
Apply → applyMaskAndReload()
POST to worker { cmd:'render-masked', samples, mask, smoothMask }
[worker] complex STFT → optional ±3-bin tent blur where smoothMask>0
→ multiply by gain mask → ISTFT (overlap-add)
onmessage 'render-done':
newBuf ← audio.samplesToBuffer(rendered, sr)
audio.replaceBuffer(newBuf, keepPosition=true)
waveform.computePeaks(new)
state.altGrids = {}; state.scalogram = null
computeTrack() — new STFT of the edited audio
Revert → audio.replaceBuffer(originalBuffer)
reset mask / HPS / altGrids / scalogram → computeTrack()
Expected after step H:
- Brush strokes appear coloured on the static spectrogram in real time.
- "Apply" produces audibly edited audio at (roughly) the same playhead position; no clicks or dropouts.
- "Revert audio" always returns you to the original file.
Resetalso reverts audio + clears mask + clears HPS as part of its job.
"Analyze HPS" → runHps(null) POST { cmd:'hps', samples, kernelH, kernelP }
[worker] STFT → hpsMasks (median-filter based Wiener) → pack gridH and gridP
onmessage 'hps-done': state.hpsGrids = {gridH, gridP, nFrames, nBins}
"Isolate harmonic" → runHps('harmonic')
[worker] additionally ISTFTs harmonic-masked STFT
→ replaceBuffer with harmonic-only samples
"Isolate percussive" → mirror of the above
Expected after step I:
- Switching to the Harmonic or Percussive tab after analysis shows the two components' individual spectrograms.
- "Isolate …" swaps the playing audio to that component. "Revert audio" restores the original.
- Kernel size (
hpsKernel) trades sharpness for robustness (17 is a good default).
Reset button →
- revert audio buffer if edited
- clear mask, hpsGrids, altGrids, scalogram, loop points
- restore every DOM control to DEFAULTS
- rebuild EQ with flat defaults, bypass OFF
- brush OFF
- applyDisplayParams(), computeTrack()
- playback keeps running at (approximately) the same position
Expected after step J:
- All sliders, tabs, selects, checkboxes back at their factory values.
- If audio was playing, it is still playing at roughly the same spot after the reset.
- The spectrogram repaints within ~1 second.
| Feature | Expected result |
|---|---|
| Drag-and-drop | Any drop zone (document) accepts; highlights while dragging |
| Decode failure | Red toast with the decoder message; status says "Error decoding file." |
| Play / Pause | Button icon toggles; Space key works unless a text input is focused |
| Stop | Resets offset to 0, sets icon back to ▶ |
| Seek bar | Updates live during playback; dragging seeks instantly |
| Waveform cursor | Matches the spectrogram cursor to the pixel |
| Scrolling spectrogram | New column per hop; cursor pinned to right edge |
| Static spectrogram | Cursor moves across; zoom/pan rescales the visible window |
| Live bars | One bar per ~96 frequency slots; colour-mapped; peak Hz shown above |
| Mel / chroma / cochlea / reassigned / cepstrum | Populated on tab switch; re-derived from cached STFT; dB-backed views respect the live Min/Max dB window |
| Scalogram | Progress bar updates; final render uses current colour map, dB window, and zoom |
| Waterfall | Updates during playback only; older frames fade into the distance |
| Features panel | Five overlaid traces for centroid / bandwidth / rolloff / flux / RMS |
| Parametric EQ | Knob changes are audible within one animation frame; curve matches |
| EQ curve visibility | Persists across cursor movement (fixed in the overlay post-hook) |
| Brush: attenuate | Mask goes to gain ≤ 1 at stroke centre, fades at the edge |
| Brush: amplify | Mask goes to gain ≥ 1 |
| Brush: erase | Both gain and smoothing masks relax back to identity |
| Brush: smooth | Paints into the smoothing mask only; applied as ±3-bin tent blur in ISTFT |
| Brush: preserve-peaks | Skips bins within ±3 of detected spectral peaks |
| Precise selection | Shift-drag or numeric entry defines a time/frequency box and redraws it on the overlay |
| Paint box | Applies the active selection rectangle through the current brush mode |
| Annotations | Markers and regions render on the overlay, persist per track, and can be exported/imported as JSON |
| Harmonic lock | Paints at 1×, 2×, …, 6× the fundamental, strength ∝ 1/√k |
| Auto-select (dbl-click) | Flood-fills a connected region within 12 dB of the seed brightness; capped at 60 000 cells |
| Apply brush | Resynthesises via ISTFT in the worker; replaces the audio buffer |
| Revert audio | Restores the original decoded buffer + resets all edits without losing stereo layout |
| HPS Analyse | Enables the Harmonic and Percussive view tabs |
| HPS Isolate | Replaces the playing audio with the isolated component while preserving channel count |
| Loop A–B | Press A to set start, B after start to set end; shaded region on waveform and spectrogram |
| Export PNG | Downloads <basename>_<mode>.png of the current canvas |
| Reset | Fully restores defaults and reverts edited audio (keeps playback going) |
| Fullscreen | Whole page enters fullscreen; layout stays responsive |
| Stereo diagnostics | Vectorscope, live correlation, imbalance, and mono-safe status update during playback |
| Responsive resize | Scroll history is preserved on window resize |
- The reassigned-spectrogram mode is a lightweight approximation (local peak sharpening) rather than the true Auger–Flandrin reassignment that requires phase-based time/frequency derivatives.
- The scalogram uses a complex Morlet CWT; very long files can take several seconds to compute even in the worker.
- Safari may reject certain
.wavfiles with unusual headers; the engine surfaces the decoder's message in the status bar. - Brush mode is automatically switched to Static full-track view on activation — painting is not supported during live scrolling playback because each new audio column would overwrite the paint strokes.
- Spectral-edit resynthesis is a one-way operation: apply chains (apply → apply → apply) compound on the modified audio. Use Revert audio to return to the original buffer.
- The per-frame brush paint currently clamps brush size in pixels, not octaves — on log-scale spectrograms the brush is narrower (in Hz) at high frequencies than at low frequencies. This is usually what you want.
- The Pro panel's live LUFS and true-peak numbers are an approximation
driven by
AnalyserNode.getFloatTimeDomainData(most recent 2048 samples, overlapping between ticks). They are close enough for monitoring, but for certification use the offline "Analyze track" values. The offline path runs the full ITU-R BS.1770-4 pipeline over the complete buffer with the correct K-weighting and gating. - The restoration presets build a spectral-mask only. Audio is not edited until you press Brush → Apply, which resynthesises through the existing ISTFT worker path. This is by design — you can layer presets and preview the visual mask before committing.
- The anomaly scanner's LSB steganalysis is probabilistic by nature. Realistic encrypted-message LSB embedding produces PoV pair counts that are statistically indistinguishable from H0 (random LSBs), so the chi-square z-score will sit near 0 and the detector will report "Inconclusive" rather than flagging. It will fire on containers that started with structured LSBs (8-bit audio re-wrapped as 16-bit, undithered quantisation, etc.) and had their LSBs replaced with an embedded payload — the classic Westfeld-Pfitzmann case. Always cross-check with the ultrasonic, tone, and geometry signals before drawing conclusions.
- The anomaly scanner's ultrasonic detection is bounded by the file's sample rate: a 44.1 kHz source can only report on content up to ≈ 22 kHz. To detect bat-echolocation-range content (30–90 kHz), feed the app a 96 kHz+ source — the min/max frequency inputs automatically expose the new Nyquist range on load.
A headless Node test suite covers the DSP modules that don't need a browser (metering, key/BPM, compliance, restoration masks, anomaly detection).
npm test # runs node tests/verify_pro.js
# or: node tests/verify_pro.jsExpect 34/34 assertions passing. UI-side code (rendering, brush, playback) is verified manually against the screenshot tour; see the per-feature expectations in §9.
Pull requests and issues are welcome. A few notes:
- Zero runtime dependencies. Keep it that way unless there's an extraordinary reason — every browser ships everything we need (Web Audio, Canvas, Workers, WebAssembly if ever).
- No build step. New source files should be loadable as plain
<script>tags fromindex.html. If you add something that needs bundling, justify it. - Keep the worker boundary clean. Heavy DSP belongs in
js/workers/analysis-worker.js; UI/state belongs injs/app.js; rendering belongs in the per-view files (renderer.js,visualizations.js, …). - Tests for DSP. If you add or change a numeric pipeline (new feature
extractor, new detector, new filter), add an assertion to
tests/verify_pro.jsso regressions show up in CI. - Screenshots. If a PR changes the UI materially, re-capture the affected
images in
docs/screenshots/and reference them from the README.
Open a discussion before starting anything large so we can align on shape and scope.
MIT © 2026 Victor Dovgan. See LICENSE for the full text.
Built with zero runtime dependencies and too much coffee.














