Skip to content
This repository has been archived by the owner on Aug 28, 2024. It is now read-only.
/ wasm-simd-audio Public archive

Experimenting with WASM SIMD instructions in Rust

Notifications You must be signed in to change notification settings

krscott/wasm-simd-audio

Repository files navigation

wasm-simd-audio

An exporation in FFT implementation using Rust+WASM+SIMD.

🎧 Live demo

⚠ SIMD does not yet have support in every modern browser. Be sure to use a browser that supports WASM SIMD.

Usage

  1. Get a punchy audio file ready to upload
  2. Open the webpage
  3. Click the bubble on the left to select an audio file to play
  4. Click Play ▶ on the audio widget

The calculation time of each algorithm is shown in the upper left with matching color to its plot. The white plot is the FFT plot given by the browser's audio AnalyzerNode as reference.

screenshot

About

In this project, I implement a basic FFT algorithm using WASM SIMD instructions (std::arch::wasm32).

For example, here is the function for multiplying 2 pairs of complex numbers in parallel using f32x4 vectors:

/// Calculate `left0 * right0` and `left1 * right1` in parallel using SIMD.
#[target_feature(enable = "simd128")]
fn simd_complex_mul(
left0: Complex<f32>,
right0: Complex<f32>,
left1: Complex<f32>,
right1: Complex<f32>,
) -> (Complex<f32>, Complex<f32>) {
// In all calculations, both pairs will do the same operations.
// So, we only need to reason about the first 2 lanes as if they were f32x2:
// (a + ib)*(c + id) = (ac - bd) + i(bc + ad)
// a | b
let a_b = f32x4(left0.re, left0.im, left1.re, left1.im);
// c | c
let c_c = f32x4(right0.re, right0.re, right1.re, right1.re);
// d | d
let d_d = f32x4(right0.im, right0.im, right1.im, right1.im);
// b | a - Get by swapping lanes of a_b (second parameter is not used)
let b_a = u32x4_shuffle::<1, 0, 3, 2>(a_b, a_b);
// ac | bc
let ac_bc = f32x4_mul(a_b, c_c);
// bd | ad
let bd_ad = f32x4_mul(b_a, d_d);
// ac-bd | - Real output
let acmbd_bcmad = f32x4_sub(ac_bc, bd_ad);
// | bc+ad - Imaginary output
let acpbd_bcpad = f32x4_add(ac_bc, bd_ad);
(
Complex {
re: f32x4_extract_lane::<0>(acmbd_bcmad),
im: f32x4_extract_lane::<1>(acpbd_bcpad),
},
Complex {
re: f32x4_extract_lane::<2>(acmbd_bcmad),
im: f32x4_extract_lane::<3>(acpbd_bcpad),
},
)
}

Results

Although SIMD instructions do allow much faster computation throughput, they do require extra instructions for loading values in and out of the vectors. In my case, one of my SIMD implementations barely matches my naive implementation, and is still 2x slower than the rustfft crate:

performance

It seems that, at least with the basic Cooley-Tukey algorithm, using SIMD for just inside the body of the loop introduces too much overhead to see much improvement, especially compared to normal optimization. I tried keeping the intermediate values as v128 vectors to save some conversion (see simd_cooley_tukey3.rs), but it didn't seem to have much effect.

TODO

  • Compare generated WASM instructions to see how wasm-pack is optimizing. The wasm32 instructions might be preventing some compiler optimizations.
  • Look into SFFT implementation.
  • Move WASM code to worker node. Currently the FFTs are evaluated in the main thread. This requires manually sending the WASM binary blob to the worker thread at runtime.

Development

To start:

npm run dev

Setup Tips

vscode: in the workspace settings.json, set the target to wasm32:

  "rust-analyzer.cargo.target": "wasm32-unknown-unknown"

wasm-opt issue: If you get an error about wasm not being optimized, then install latest version of binaryen and put it in your PATH. It will work without, but the benchmark will be wrong. (more info)

Deployment

⚠ Be sure you don't have any wasm-opt errors (see above)!

Github pages:

npm run deploy

About

Experimenting with WASM SIMD instructions in Rust

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages