SoundView

Real-time audio visualizer that turns microphone input into a rich, multi-layered visual display intended to mimic the human auditory processing system and the features it provides for free.

In theory one could learn to read the live visual display to "hear" and intepret speech, music, and other ambient sound using your eyes.

Live Site: https://pirate.github.io/soundview/

Background / Inspiration: This 💬 GPT-5.4 conversation where I was asking about how human brains process sound.

Related Projects & Resources

What It Shows

Cochleagram (main spectrogram area, ~60% of screen)

A scrolling time-frequency display rendered at native Retina resolution. Each pixel column represents one frame (~16ms) of audio.

Vertical axis: Frequency (50Hz at bottom, 16kHz at top), with a piecewise log scale that compresses the extremes and expands the 200-8000Hz speech/music range for maximum detail
Color: Thermal colormap from black (silent) through blue, cyan, green, yellow, orange, red to white (loud)
Resolution: FFT size 8192 gives ~5.4Hz per bin, with per-pixel rendering via ImageData
Sensitivity: Adjustable via slider, with perceptual gamma compression (0.35) and a noise gate to suppress mic self-noise

Overlay Lines on Cochleagram

White line: Pitch (fundamental frequency) tracked via YIN-style autocorrelation. Snaps instantly on octave jumps instead of drawing diagonals
Pink line: Spectral centroid (brightness/timbre), smoothed with jitter rejection — only shows when stable
Cyan line: Spectral rolloff (frequency below which 85% of energy lives)
White voice lines: Up to 4 simultaneous pitches detected via subharmonic summation
Green formant dots: F1/F2/F3 vocal tract resonances at their frequency positions
Green harmonic dots: Overtone series when pitch is detected

Noise Fuzz (top of cochleagram)

Three rows of scattered pixels at the top, gated on aperiodic content only (suppressed during speech/music):

Top row: High-frequency noise — cyan (hissy) or white (broadband)
Middle row: Mid-frequency noise — pink (pink noise) or grey (balanced)
Bottom row: Low-frequency noise — brown/red (rumble)

Density and opacity scale with noise loudness. Color indicates noise spectral tilt.

Beat Detection (blue vertical lines)

BTrack-style beat tracker (Adam Stark, 2014):

Onset detection function from spectral flux feeds into a circular buffer
Autocorrelation estimates tempo period (60-164 BPM range) with Rayleigh weighting
Cumulative score array chains evidence backward by one beat period
Beat counter triggers when accumulated score peaks
Requires 6+ consecutive confirmed beats before showing (prevents false positives)
Every 10th beat displays the current BPM as a number

Broadband Transient Lines (white dashed vertical)

Detects sudden broadband energy spikes (claps, thuds, impacts) by counting how many cochleagram rows are "bright" in the current frame vs the running average. Debounced at 250ms.

Harmonic Profile Strip (~25% of screen)

32 rows showing the first 16 harmonics at 2x resolution with interpolation. Each row is color-coded by acoustic role:

Harmonic	Color	Significance
H1	White	Fundamental strength
H2	Cyan	Breathiness indicator (H2/H1 ratio)
H3	Orange	Power/projection
H5, H7	Yellow	Odd-harmonic signature (nasal, reed, square wave)
H8-H10	Magenta	Brilliance region (trained singer, brass)
Even (H4,H6...)	Blue	Even harmonics
H11+	Gold	Upper partials

Additional dimensions encoded:

Brightness: Harmonic amplitude (dB-compressed for visibility)
Saturation: Harmonic purity (peak vs surrounding noise floor)
Flash/dim: Temporal derivative (brightens on attack, dims on decay)
White line: Tracks the dominant non-fundamental harmonic when stable for 200ms+

Harmonics are computed from the store's autocorrelation-based pitch, with fallback to the strongest voice from multi-pitch detection (works for music through speakers, not just direct voice).

MIDI Note Strip (~7% of screen)

Scrolling piano-roll-style view of detected chord notes, with 12 rows (one per pitch class, C at bottom, B at top):

Chord tones light up in their pitch-class color (C=red, C#=orange, D=yellow, D#=yellow-green, E=green, F=teal, F#=cyan, G=blue, G#=indigo, A=purple, A#=magenta, B=pink) at full brightness proportional to chroma energy
Non-chord active notes shown as dim versions of their pitch-class color
Inactive notes are near-black, creating a clear on/off MIDI-note appearance
Chord name overlaid as text on the strip every ~1 second

Under the hood, chromagram energy (12 pitch-class bins folded from the FFT spectrum) is thresholded and compared against detected chord templates (major, minor, diminished, dominant 7th, minor 7th) to determine which notes belong to the current chord.

Circle of Fifths (overlay, bottom-left)

Interactive key detection visualization rendered on the overlay canvas above the timbre space map:

Outer ring: 12 major keys arranged in circle-of-fifths order (C at top, clockwise: C→G→D→A→E→B→F#→C#→G#→D#→A#→F)
Inner ring: 12 relative minor keys (Am at top, following the same fifths order)
Highlight: The detected key segment lights up blue when confident
Center: Shows the currently detected chord name
Key detection: Krumhansl-Kessler key profiles matched against the chromagram via Pearson correlation, with a slow accumulator for stability (updates ~4× per second)
Chord detection: Cosine similarity matching against chord templates (major, minor, dim, dom7, min7), updated every frame for responsiveness

Feature Strip (bottom ~15% of screen)

Energy/Spread/Flux band (3 rows):

Background color: spectral spread mapped to blue (narrow) → green → yellow → red (wide), modulated by RMS energy as brightness
White line: energy envelope (spectral flux, unsmoothed for fast transient response)
Black line: derivative of flux (spikes on onsets, dips on releases)
Blue squares: beat markers from the BTrack detector

Top Frequencies band (5 rows):

Background: instrument classification color (green=vocal, red-orange=drums, gold=brass, purple=strings, blue=piano, grey=noise)
Colored lines: top 3 detected frequencies via iterative peak picking with suppression and merge (same colors as voice arrows)

Voice Arrows (right edge, overlay canvas)

Up to 4 simultaneously detected pitches shown as colored arrows pointing left from the right edge, with frequency labels. Drawn on a separate canvas that clears each frame (never scrolls). Black background strip keeps them readable.

Colors match between arrows and their corresponding lines on the cochleagram and top-freq band:

Orange: strongest voice
Blue: 2nd voice
Green: 3rd voice
Magenta: 4th voice

Timbre Space Map (overlay, bottom-left)

2D scatter plot showing timbral characteristics as a moving dot with a fading trail:

X axis: Spectral centroid (log scale, 200Hz–8kHz) — left=dark, right=bright
Y axis: MFCC[1] (spectral tilt, adaptive normalization) — bottom=warm, top=cold
Dot color: Tristimulus (T1=red=fundamental dominance, T2=green=mid harmonics H2-H4, T3=blue=upper partials H5+). Each trail point stores its own tristimulus color from the time it was recorded
Inharmonicity bar: Orange bar at bottom edge, length proportional to harmonic deviation
Computed from raw (un-normalized) harmonic amplitudes for accurate energy ratios

MFCC Strip (~8% of screen)

13 rows showing Mel-Frequency Cepstral Coefficients with a diverging blue↔orange colormap:

Adaptive normalization tracks min/max per coefficient over time
MFCC[0] (bottom) represents overall spectral energy level
Higher MFCCs capture increasingly fine spectral envelope detail
Useful for distinguishing vowel sounds, instrument timbres, and speech vs music

Audio Analysis Pipeline

`src/audio/engine.js`

Microphone capture with AGC/noise suppression/echo cancellation disabled. Creates an AnalyserNode with FFT size 8192.

`src/audio/filterbank.js`

28 BiquadFilter bandpass filters from 30Hz to 20kHz (~1/3 octave spacing), each with its own AnalyserNode for per-band energy extraction.

`src/audio/pitch.js`

YIN-style autocorrelation pitch detection on the first 2048 samples of the time-domain buffer. Collects all NSDF peaks, finds the global best, then accepts the first peak within 50% of the best (proper YIN heuristic). Range: 60-800Hz.

`src/audio/features.js`

Per-frame feature extraction (~400 lines):

Band energy, smoothing, peak tracking, delta, periodicity (autocorrelation with 32 lag limit), roughness
Full spectrum copy (spectrumDb)
RMS, noise floor estimation, signal-above-noise, signal presence detection
Spectral shape: centroid (with smoothing + snap/jitter rejection), spread, flatness, slope, rolloff
Noisiness decomposition (tonal vs noise energy)
Pitch detection + smoothing (fast tracking with octave-jump snap)
Harmonicity + 32 harmonic amplitudes (normalized to fundamental)
Modulation depth/rate from envelope analysis
Onset detection (spectral flux with adaptive median threshold)
Chromagram, key/chord detection (delegated to chroma.js)
Timbre descriptors (delegated to timbre.js)
Clears key/chord state during silence

`src/audio/chroma.js`

Chromagram computation + key/chord detection:

Folds FFT spectrum into 12 pitch-class bins (HPCP) across the 60–5000Hz range
Log-scales chroma energy using the same dB floor/range approach as the cochleagram, with the sensitivity slider applied for consistent brightness control
Key detection via Pearson correlation against Krumhansl-Kessler major/minor profiles, with a slow accumulator (updates every 15 frames)
Chord detection via cosine similarity against 5 chord templates (major, minor, dim, dom7, min7), every frame

`src/audio/timbre.js`

Timbre descriptors:

13 MFCCs via 26-band mel filterbank → log compression → DCT-II
Tristimulus (T1/T2/T3) from raw harmonic amplitudes (fundamental, mid harmonics H2-H4, upper partials H5+), decays toward zero when no pitch is detected
Inharmonicity: weighted deviation of actual partial frequencies from perfect harmonic series

`src/audio/modulation.js`

Per-band modulation spectrum via 64-point FFT of envelope history. 7 modulation bands from <1Hz to roughness (30-300Hz). Runs every 4th frame.

`src/audio/formants.js`

Spectral peak picking for F1/F2/F3 with:

Wide smoothing window (~150Hz) to reveal formant envelope over individual harmonics
Frequency-constrained assignment (F1: 200-1000Hz, F2: 600-2800Hz, F3: 1500-4500Hz)
Hysteresis smoothing (fast on large jumps, slow on jitter)
Rule-based sound classifier (silence/voiced harmonic/voiced noisy/fricative/plosive/nasal)

`src/store/feature-store.js`

Shared typed-array data bus. All audio features written by the analysis pipeline, read by the renderer each frame.

`src/scene/engine.js`

Minimal render loop using performance.now() for timing. No Three.js — pure 2D canvas.

`src/scene/layers/spectrum-wall.js`

The main renderer (~1400 lines). Creates two canvases:

Spectrogram canvas: scrolling cochleagram + harmonics + MIDI note strip + feature strip + MFCC strip, rendered with ImageData for pixel-perfect output
Overlay canvas: voice arrows, Circle of Fifths key display, timbre space map — cleared each frame

Also contains:

Multi-pitch detection via subharmonic summation
Top-frequency extraction (iterative argmax with suppression + merge)
Simple instrument classifier
BTrack beat tracker
Noise fuzz renderer
Chord-to-pitch-class parsing for MIDI note highlighting
Circle of Fifths rendering with key/chord detection display

Development

pnpm install
pnpm dev

# deploy:
pnpm build
npx gh-pages -d dist

Click "click to start" to grant microphone access. Controls:

sens: Sensitivity offset in dB (shifts the brightness curve)
speed: Scroll speed in pixels per frame (1-20)

Tech Stack

Vanilla JS (no framework)
Web Audio API (AnalyserNode, BiquadFilter, MediaStream)
Canvas 2D (ImageData for cochleagram, fillRect for features, arc for voice circles)
Vite for dev server and build

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.claude		.claude
src		src
styles		styles
.gitignore		.gitignore
README.md		README.md
favicon.svg		favicon.svg
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundView

Related Projects & Resources

What It Shows

Cochleagram (main spectrogram area, ~60% of screen)

Overlay Lines on Cochleagram

Noise Fuzz (top of cochleagram)

Beat Detection (blue vertical lines)

Broadband Transient Lines (white dashed vertical)

Harmonic Profile Strip (~25% of screen)

MIDI Note Strip (~7% of screen)

Circle of Fifths (overlay, bottom-left)

Feature Strip (bottom ~15% of screen)

Voice Arrows (right edge, overlay canvas)

Timbre Space Map (overlay, bottom-left)

MFCC Strip (~8% of screen)

Audio Analysis Pipeline

`src/audio/engine.js`

`src/audio/filterbank.js`

`src/audio/pitch.js`

`src/audio/features.js`

`src/audio/chroma.js`

`src/audio/timbre.js`

`src/audio/modulation.js`

`src/audio/formants.js`

`src/store/feature-store.js`

`src/scene/engine.js`

`src/scene/layers/spectrum-wall.js`

Development

Tech Stack

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SoundView

Related Projects & Resources

What It Shows

Cochleagram (main spectrogram area, ~60% of screen)

Overlay Lines on Cochleagram

Noise Fuzz (top of cochleagram)

Beat Detection (blue vertical lines)

Broadband Transient Lines (white dashed vertical)

Harmonic Profile Strip (~25% of screen)

MIDI Note Strip (~7% of screen)

Circle of Fifths (overlay, bottom-left)

Feature Strip (bottom ~15% of screen)

Voice Arrows (right edge, overlay canvas)

Timbre Space Map (overlay, bottom-left)

MFCC Strip (~8% of screen)

Audio Analysis Pipeline

src/audio/engine.js

src/audio/filterbank.js

src/audio/pitch.js

src/audio/features.js

src/audio/chroma.js

src/audio/timbre.js

src/audio/modulation.js

src/audio/formants.js

src/store/feature-store.js

src/scene/engine.js

src/scene/layers/spectrum-wall.js

Development

Tech Stack

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages

`src/audio/engine.js`

`src/audio/filterbank.js`

`src/audio/pitch.js`

`src/audio/features.js`

`src/audio/chroma.js`

`src/audio/timbre.js`

`src/audio/modulation.js`

`src/audio/formants.js`

`src/store/feature-store.js`

`src/scene/engine.js`

`src/scene/layers/spectrum-wall.js`