Skip to content

feat: Implement spectrogram visualization for AudioPlus #7400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

cloudmark
Copy link

@cloudmark cloudmark commented Apr 20, 2025

Spectrogram visualization to Audio Component

Reason for change

This PR adds spectrogram visualization support to the audio editor, enabling users to visualize frequency content over time in audio recordings. This feature enhances audio annotation capabilities by providing visual frequency analysis tools, particularly useful for tasks like speech analysis, music transcription, and sound event detection.

The implementation includes:

  • FFT-based spectrogram rendering with configurable parameters
  • Multiple color scheme options with live preview
  • Mel-scale frequency mapping support
  • Real-time parameter adjustment
  • Performance optimizations for smooth rendering
  • Modular code organization for maintainability

Screenshots

  1. Configuration Interface
Screenshot 2025-04-20 at 19 05 32

Shows the labeling interface configuration with the new spectrogram="true" property in the XML configuration, demonstrating how the feature can be enabled through the labeling interface.

  1. Color Scheme Selection
Screenshot 2025-04-20 at 19 05 01

Demonstrates the color scheme selection interface with:

  • Multiple predefined schemes (Autumn, Bathymetry, Blackbody, etc.)
  • Visual preview for each scheme
  • Real-time application of color changes
  • Smooth transition between schemes
  1. Time Display on Hover
Screenshot 2025-04-20 at 19 04 50

Shows interactive features:

  • Precise time indicator (00:00:00.447) on hover
  • Clear visualization of frequency content
  • Smooth rendering of the spectrogram
  • Integration with the waveform display
  1. Playback and Spectrogram Settings
Screenshot 2025-04-20 at 19 04 47

Comprehensive control panel featuring:

  • Playback speed adjustment
  • Audio zoom y-axis control
  • FFT Samples slider (64-2048)
  • Loop Regions toggle
  • Auto-play New Regions toggle
  • Integrated spectrogram controls
  1. Advanced Spectrogram Controls
Screenshot 2025-04-20 at 19 04 43

Detailed configuration options:

  • FFT Samples selection (512)
  • Mel Bands adjustment (64)
  • Spectrogram dB range (-80 to -10)
  • Windowing Function selection (Blackman)
  • Color Scheme selection (Viridis)
  • View toggles for timeline, audio wave, and spectrogram

Rollout strategy

The feature is implemented with a progressive enhancement approach:

  1. Feature Flag:
<AudioPlus name="audio"
          value="$audio"
          hotkey="space"
          sync="group_a"
          defaultscale="1"
          defaultzoom="2"
          zoom="true"
          spectrogram="true"
/>
  1. Backward Compatibility:
  • Existing audio components continue to work without changes
  • Spectrogram can be enabled/disabled per instance
  • All new parameters have sensible defaults
  1. Performance Considerations:
  • Lazy loading of FFT computation code
  • Progressive rendering for large files
  • Configurable quality settings

Testing

Comprehensive testing strategy:

  1. Unit Tests:
  • WindowFunctions.ts: Window function calculations
  • ColorMapper.ts: Color scheme management
  • FFT computation accuracy
  • Parameter validation
  1. Integration Tests:
  • Audio loading and visualization
  • Real-time parameter updates
  • Color scheme switching
  • Performance benchmarks
  1. Manual Testing Scenarios:
  • Various audio formats (WAV, MP3, OGG)
  • Different file lengths (short clips to long recordings)
  • Multiple sample rates and bit depths
  • Browser compatibility (Chrome, Firefox, Safari)
  • Performance with large files

Risks

  1. Performance:
  • FFT computation is CPU-intensive
  • Mitigated through:
    • Chunked rendering
    • Yield scheduling
    • Canvas optimization
    • Caching mechanisms
  1. Memory Usage:
  • Large audio files require more memory for FFT
  • Mitigated through:
    • Buffer management
    • Cleanup of unused resources
    • Progressive loading

Reviewer notes

Key areas to review:

  1. Visualizer.ts: Spectrogram rendering logic
  2. WindowFunctions.ts: Audio processing utilities
  3. ColorMapper.ts: Color scheme management
  4. Performance optimizations in rendering loops
  5. Error handling and edge cases

General notes

The spectrogram visualization feature provides:

  • Real-time frequency analysis
  • Multiple color schemes for different use cases
  • Configurable parameters for detailed analysis
  • Smooth integration with existing audio tools
  • Optimized performance for large files

robot-ci-heartex and others added 2 commits April 20, 2025 18:36
This commit adds spectrogram visualization capabilities to the audio editor
through a new optional 'spectrogram' property in the AudioPlus component.

Example usage:
<AudioPlus
  name="audio"
  value="$audio"
  height="240"
  hotkey="space"
  defaultscale="1"
  defaultzoom="2"
  zoom="true"
  spectrogram="true"
  sync="group_a"
/>

Key changes:
- Add new 'spectrogram' boolean property to AudioPlus component
- Extract window functions into a dedicated WindowFunctions module
- Create a new ColorMapper module for spectrogram coloring
- Refactor Visualizer class to use the new modules
- Add support for different window functions and color schemes
- Improve type safety and code organization

The spectrogram visualization allows users to:
- Toggle spectrogram view using the 'spectrogram' property
- View frequency content over time alongside waveform
- Switch between different color schemes
- Configure window functions for FFT analysis
- Adjust visualization parameters (FFT size, dB range)

Configuration:
- spectrogram: boolean (optional) - When set to true, enables
  spectrogram visualization alongside the waveform

Labels: audio, editor, feature, community:feature-request, community:reviewed

Closes HumanSignal#384
Add spectrogram visualization capabilities to the audio editor component with configurable settings and improved UI controls.

Key changes:
- Extract window functions into separate WindowFunctions module for better code organization
- Create new ColorMapper module for handling spectrogram color schemes
- Add spectrogram property to AudioPlus component (optional boolean to enable/disable)
- Implement FFT-based spectrogram rendering with configurable parameters
- Add UI controls for spectrogram settings (FFT size, color scheme, dB range)
- Fix CSS styling issues in the configuration modal
- Improve section header positioning and spacing

Features:
- Real-time spectrogram visualization
- Configurable FFT window size and type
- Multiple color scheme options
- Adjustable dB range for visualization
- Mel-scale frequency mapping support
- Responsive rendering with performance optimizations

Labels:
- audio
- community:feature-request
- community:reviewed
- editor
- feature

Closes HumanSignal#384
Copy link

netlify bot commented Apr 20, 2025

👷 Deploy request for heartex-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit ae15d4f

Copy link

netlify bot commented Apr 20, 2025

👷 Deploy request for label-studio-docs-new-theme pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit ae15d4f

Copy link

netlify bot commented Apr 20, 2025

Deploy Preview for label-studio-storybook ready!

Name Link
🔨 Latest commit ae15d4f
🔍 Latest deploy log https://app.netlify.com/sites/label-studio-storybook/deploys/680bee5ee59421000859e50f
😎 Deploy Preview https://deploy-preview-7400--label-studio-storybook.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@cloudmark cloudmark changed the title Feature/spectrogram analyser Spectrogram Analysis in Audio component Apr 20, 2025
@cloudmark
Copy link
Author

To help visualize the new spectrogram functionality implemented in this PR (#7400), I've recorded a short video demonstration:

Video Demonstration: Spectrogram Feature

What the video shows:

The video walks through the spectrogram feature within the Label Studio interface, highlighting:

  • The spectrogram display integrated below the audio waveform.
  • Synchronized playback tracking across both the waveform and spectrogram.
  • How zooming affects both views simultaneously.
  • Real-time updates as various configuration options are adjusted in the settings panel:
    • Changing FFT window sizes.
    • Applying different color schemes.
    • Selecting various windowing functions (Blackman, Hann, Hamming).
    • Toggling the Mel frequency scale.
    • Adjusting the amplitude (dB) range.

Hope this provides a helpful overview of the user experience!

@cloudmark cloudmark changed the title Spectrogram Analysis in Audio component feat: Implement spectrogram visualization for AudioPlus Apr 20, 2025
@makseq
Copy link
Member

makseq commented Apr 20, 2025

Great PR! How well will it work with long audio files around 1-2 hours?

@cloudmark
Copy link
Author

cloudmark commented Apr 21, 2025

Hey @makseq,

TL; DR: Yes! it handles long files (1-2 hours) efficiently.

The core strategies implemented are:

  1. On-Demand Processing: It only analyzes the audio needed for the currently visible portion.
  2. Efficient Sampling (Zoomed Out): When zoomed out (e.g., 1 hour on a 1024px view), each pixel represents a time slice (~3.5s). For each pixel, I take one representative FFT (e.g., 512 samples covering ~11.6ms) within that slice, avoiding processing every single audio sample.
  3. Non-Blocking Rendering: A generator function (renderSpectrogramSlice) renders pixel by pixel, yielding frequently (~16ms) to keep the UI responsive during interactions like scrolling/zooming.

This approach balances performance, memory, and visual overview. As you zoom in, the detail naturally increases as fewer samples are represented per pixel.

Separately, the chosen FFT window size affects the computation time per slice (larger FFTs = more detail but slower slice render). This characteristic is independent of total file length. For the most fluid feel, 512 is often a good balance.

To demonstrate this with varied audio content, the video uses a 1-hour file created by concatenating samples from the ESC-50 dataset (https://github.com/karolpiczak/ESC-50). This dataset contains 2000 short environmental sound recordings across 50 categories (like dogs barking, rain, helicopters, etc.), ensuring the test file has diverse spectral characteristics.

Video Demo: Spectrogram Performance & FFT Size Impact (1hr ESC-50 file)

(Video shows loading/panning the long, varied file & the visible speed difference when switching FFT sizes).

@farioas
Copy link
Member

farioas commented Apr 21, 2025

@cloudmark please rebase your branch on the latest changes from repo to include this commit 9b0487f. It will fix failing checks.

@cloudmark
Copy link
Author

Done @farioas, I also updated #7376 to take from this upstream

@makseq
Copy link
Member

makseq commented Apr 25, 2025

<AudioPlus> was deprecated, could you please use <Audio> instead?

@cloudmark
Copy link
Author

Thank you @makseq for the heads up. I think internally they should resolve to the same component so there are no further updates needed (I believe).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants