Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SIMD blog post #545

Merged
merged 7 commits into from
Apr 19, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
142 changes: 119 additions & 23 deletions src/features/simd.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: 'Fast, parallel applications with WebAssembly SIMD'
author: 'Deepti Gandluri ([@dptig](https://twitter.com/dptig)), Thomas Lively ([@tlively52](https://twitter.com/tlively52))'
author: 'Deepti Gandluri ([@dptig](https://twitter.com/dptig)), Thomas Lively ([@tlively52](https://twitter.com/tlively52)), Ingvar Stepanyan ([@RReverser](https://twitter.com/RReverser))'
date: 2020-01-30
updated: 2020-06-09
updated: 2021-04-19
tags:
- WebAssembly
description: 'Bringing vector operations to WebAssembly'
Expand All @@ -18,28 +18,67 @@ The high-level goal of the WebAssembly SIMD proposal is to introduce vector oper

The set of SIMD instructions is large, and varied across architectures. The set of operations included in the WebAssembly SIMD proposal consist of operations that are well supported on a wide variety of platforms, and are proven to be performant. To this end, the current proposal is limited to standardizing Fixed-Width 128-bit SIMD operations.

The current proposal introduces a new v128 value type, and a number of new operations that operate on this type. The criteria used to determine these operations are:
The current proposal introduces a new `v128` value type, and a number of new operations that operate on this type. The criteria used to determine these operations are:

- The operations should be well supported across multiple modern architectures.
- Performance wins should be positive across multiple relevant architectures within an instruction group.
- The chosen set of operations should minimize performance cliffs if any.

The proposal is in active development, both V8 and the toolchain have working prototype implementations for experimentation. As these are prototype implementations, they are subject to change as new operations are added to the proposal.
The proposal is now in [finalized state (phase 4)](https://github.com/WebAssembly/simd/issues/480), both V8 and the toolchain have working implementations.

## Using WebAssembly SIMD
## Enabling SIMD support

### Feature detection

First of all, note that SIMD is a new feature and isn't yet available in all browsers with WebAssembly support. You can find which browsers support new WebAssembly features on the [webassembly.org](https://webassembly.org/roadmap/) website.

To ensure that all users can load your application, you'll need to build two different versions - one with SIMD enabled and one without it - and load the corresponding version depending on feature detection results. To detect SIMD at runtime, you can use [`wasm-feature-detect`](https://github.com/GoogleChromeLabs/wasm-feature-detect) library and load the corresponding module like this:

```js
import { simd } from 'wasm-feature-detect';

(async () => {
const hasSIMD = await simd();
const module = await (
hasSIMD
? import('./module-with-simd.js')
: import('./module-without-simd.js')
);
// …now use `module` as you normally would
})();
```

To learn about building code with SIMD support, check the section [below](#building-with-simd-support).

### Enabling experimental SIMD support in Chrome

WebAssembly SIMD support is prototyped behind a flag in Chrome, to try out the SIMD support on the browser, pass `--enable-features=WebAssemblySimd`, or toggle the "WebAssembly SIMD support" flag in `chrome://flags`. This work is bleeding edge, and continuously being worked on. To minimize the chances of breakage, please use the latest version of the toolchain as detailed below, and a recent Chrome Canary. If something doesn’t look right, please [file a bug](https://crbug.com/v8).
WebAssembly SIMD support will be available by default from Chrome 91, while on older versions it's gated behind a flag. To try out the SIMD support in stable Chrome, pass `--enable-features=WebAssemblySimd`, or toggle the "WebAssembly SIMD support" flag in `chrome://flags`. Make sure to use the latest version of the toolchain as detailed below, and a recent Chrome Canary. If something doesn’t look right, please [file a bug](https://crbug.com/v8).

### Building C / C++ to target SIMD
WebAssembly SIMD is also available as an origin trial in Chrome versions 84-90. Origin trials allow developers to experiment with a feature on the chosen origin, and provide valuable feedback. Once an origin trial token has been registered, the trial users are opted into the feature for the duration of the trial period without having to update Chrome flags.

To try this out, read the [origin trial developer guide](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md), and [register for an origin trial token](https://developers.chrome.com/origintrials/#/view_trial/-4708513410415853567). More information about origin trials can be found in the [FAQ](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md#faq). Please file a [bug](https://bugs.chromium.org/p/v8/issues/entry) if something isn't working as you expect. The origin trial is compatible with Emscripten versions 2.0.17 onwards.

### Enabling experimental SIMD support in Firefox

WebAssembly’s SIMD support depends on using a recent build of clang with the WebAssembly LLVM backend enabled. Emscripten has support for the WebAssembly SIMD proposal as well. Install and activate the latest-upstream distribution of emscripten using [emsdk](https://emscripten.org/docs/getting_started/downloads.html) to use the bleeding edge SIMD features.
WebAssembly SIMD is available behind a flag in Firefox. Currently it's supported only on x86 and x86-64 architectures. To try out the SIMD support in Firefox, go to `about:config` and enable `javascript.options.wasm_simd`. Note that this feature is still experimental and being worked on.

### Enabling experimental SIMD support in Node.js

In Node.js WebAssembly SIMD can be enabled via `--experimental-wasm-simd` flag:

```bash
./emsdk install latest-upstream
node --experimental-wasm-simd main.js
```

./emsdk activate latest-upstream
## Building with SIMD support

### Building C / C++ to target SIMD

WebAssembly’s SIMD support depends on using a recent build of clang with the WebAssembly LLVM backend enabled. Emscripten has support for the WebAssembly SIMD proposal as well. Install and activate the `latest` distribution of emscripten using [emsdk](https://emscripten.org/docs/getting_started/downloads.html) to use the bleeding edge SIMD features.

```bash
./emsdk install latest
./emsdk activate latest
```

There are a couple of different ways to enable generating SIMD code when porting your application to use SIMD. Once the latest upstream emscripten version has been installed, compile using emscripten, and pass the `-msimd128` flag to enable SIMD.
Expand Down Expand Up @@ -110,22 +149,85 @@ void multiply_arrays(int* out, int* in_a, int* in_b, int size) {

This manually rewritten code assumes that the input and output arrays are aligned and do not alias and that size is a multiple of four. The autovectorizer cannot make these assumptions and has to generate extra code to handle the cases where they are not true, so hand-written SIMD code often ends up being smaller than autovectorized SIMD code.

### Cross-compiling existing C / C++ projects

Many existing projects already support SIMD when targeting other platforms, in particular [SSE](https://en.wikipedia.org/wiki/Streaming_SIMD_Extensions) and [AVX](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions) instructions on x86 / x86-64 platforms and [NEON](https://en.wikipedia.org/wiki/ARM_architecture#Advanced_SIMD_(Neon)) instructions on ARM platforms. There are two ways those are usually implemented.

First one is via assembly files that take care of SIMD operations and are linked together with C / C++ during the build process. The assembly syntax and instructions are highly platform-dependant and not portable, so, to make use of SIMD, such projects need to add WebAssembly as an additional supported target and reimplement corresponding functions using either [WebAssembly text format](https://webassembly.github.io/spec/core/text/index.html) or intrinsics described [above](#building-c-/-c++-to-target-simd).

Another common approach is to use SSE / SSE2 / AVX / NEON intrinsics directly from C / C++ code and here Emscripten can help. Emscripten [provides compatible headers and an emulation layer](https://emscripten.org/docs/porting/simd.html) for all those instruction sets, and an emulation layer that compiles them directly to Wasm intrinsics where possible, or scalarized code otherwise.

To cross-compile such projects, first enable SIMD via project-specific configuration flags, e.g. `./configure --enable-simd` so that it passes `-msse`, `-msse2`, `-mavx` or `-mfpu=neon` to the compiler and calls corresponding intrinsics. Then, additionally pass `-msimd128` to enable WebAssembly SIMD too either by using `CFLAGS=-msimd128 make …` / `CXXFLAGS="-msimd128 make …` or by modifying the build config directly when targeting Wasm.

### Building Rust to target SIMD

When compiling Rust code to target WebAssembly SIMD, you'll need to enable the same `simd128` LLVM feature as in Emscripten above.

If you can control `rustc` flags directly or via environment variable `RUSTFLAGS`, pass `-C target-feature=+simd128`:

```bash
rustc … -C target-feature=+simd128 -o out.wasm
```

or

```bash
RUSTFLAGS="-C target-feature=+simd128" cargo build
```

Like in Clang / Emscripten, LLVM’s autovectorizers are enabled by default for optimized code when `simd128` feature is enabled.

For example, Rust equivalent of the `multiply_arrays` example above

```rust
pub fn multiply_arrays(out: &mut [i32], in_a: &[i32], in_b: &[i32]) {
in_a.iter()
.zip(in_b)
.zip(out)
.for_each(|((a, b), dst)| {
*dst = a * b;
});
}
```

would produce similar autovectorized code for the aligned part of the inputs.

In order to have manual control over the SIMD operations, you can use the nightly toolchain, enable Rust feature `wasm_simd` and invoke the intrinsics from the [`std::arch::wasm32`](https://doc.rust-lang.org/stable/core/arch/wasm32/index.html#simd) namespace directly:

```rust
#![feature(wasm_simd)]

use std::arch::wasm32::*;

pub unsafe fn multiply_arrays(out: &mut [i32], in_a: &[i32], in_b: &[i32]) {
in_a.chunks(4)
.zip(in_b.chunks(4))
.zip(out.chunks_mut(4))
.for_each(|((a, b), dst)| {
let a = v128_load(a.as_ptr() as *const v128);
let b = v128_load(b.as_ptr() as *const v128);
let prod = i32x4_mul(a, b);
v128_store(dst.as_mut_ptr() as *mut v128, prod);
});
}
```

Alternatively, use a helper crate like [`packed_simd`](https://crates.io/crates/packed_simd_2) that abstracts over SIMD implementations on various platforms.

## Compelling use cases

The WebAssembly SIMD proposal seeks to accelerate high compute applications like audio/video codecs, image processing applications, cryptographic applications, etc. Currently WebAssembly SIMD is experimentally supported in widely used open source projects like [Halide](https://github.com/halide/Halide/blob/master/README_webassembly.md), [OpenCV.js](https://docs.opencv.org/3.4/d5/d10/tutorial_js_root.html), and [XNNPACK](https://github.com/google/XNNPACK).

Some interesting demos come from the [MediaPipe project](https://github.com/google/mediapipe) by the Google Research team.

As per their description, MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. And they have a [Web version](https://mediapipe.page.link/web), too!
As per their description, MediaPipe is a framework for building multimodal (eg. video, audio, any time series data) applied ML pipelines. And they have a [Web version](https://developers.googleblog.com/2020/01/mediapipe-on-web.html), too!

One of the most visually appealing demos where it’s easy to observe the difference in performance SIMD makes, is a following hand-tracking system. Without SIMD, you can get only around 3 frames per second on a modern laptop, while with SIMD enabled you get a much smoother experience at 15-16 frames per second.
One of the most visually appealing demos where it’s easy to observe the difference in performance SIMD makes, is a CPU-only (non-GPU) build of a hand-tracking system. [Without SIMD](https://storage.googleapis.com/aim-bucket/users/tmullen/demos_10_2019_cdc/rebuild_04_2021/mediapipe_handtracking/gl_graph_demo.html), you can get only around 14-15 FPS (frames per second) on a modern laptop, while [with SIMD enabled in Chrome Canary](https://storage.googleapis.com/aim-bucket/users/tmullen/demos_10_2019_cdc/rebuild_04_2021/mediapipe_handtracking_simd/gl_graph_demo.html) you get a much smoother experience at 38-40 FPS.

<figure>
<video autoplay muted playsinline loop width="600" height="216" src="/_img/simd/hand.mp4"></video>
</figure>

Visit the [demo](https://pursuit.page.link/MediaPipeHandTrackingSimd) in Chrome Canary with SIMD enabled to try it!

Another interesting set of demos that makes use of SIMD for smooth experience, come from OpenCV - a popular computer vision library that can also be compiled to WebAssembly. They’re available by [link](https://bit.ly/opencv-camera-demos), or you can check out the pre-recorded versions below:

<figure>
Expand All @@ -143,14 +245,8 @@ Another interesting set of demos that makes use of SIMD for smooth experience, c
<figcaption>Emoji replacement</figcaption>
</figure>

## SIMD Origin Trial

The WebAssembly SIMD origin trial is available for experimentation in Chrome versions 84-86. Origin trials allow developers to experiment with a feature, and provide valuable feedback. Once an origin trial token has been registered, the trial users are opted into the feature for the duration of the trial period without having to update Chrome flags.

To try this out, read the [origin trial developer guide](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md), and [register for an origin trial token](https://developers.chrome.com/origintrials/#/view_trial/-4708513410415853567). More information about origin trials can be found in the [FAQ](https://github.com/GoogleChrome/OriginTrials/blob/gh-pages/developer-guide.md#faq), please file a [bug](https://bugs.chromium.org/p/v8/issues/entry) if something isn't working as you expect. The origin trial is compatible with emscripten versions 1.39.15 onwards.

Ongoing experimental support is available on a recent Chrome Canary as detailed [above](#using-webassembly-simd), with the use of latest-upstream Emscripten toolchain.

## Future work

The current SIMD proposal is in [Phase 3](https://github.com/WebAssembly/meetings/blob/master/process/phases.md#3-implementation-phase-community--working-group), so the future work here is to push the proposal forward in the standardization process. Fixed width SIMD gives significant performance gains over scalar, but it doesn’t effectively leverage wider width vector operations that are available in modern hardware. As the current proposal moves forward, some future facing work here is to determine the feasibility of extending the proposal with longer width operations.
The current fixed-width SIMD proposal is in [Phase 4](https://github.com/WebAssembly/meetings/blob/master/process/phases.md#3-implementation-phase-community--working-group), so it's considered complete.

Some explorations of future SIMD extensions have started in [Relaxed SIMD](https://github.com/WebAssembly/relaxed-simd) and [Flexible Vectors](https://github.com/WebAssembly/flexible-vectors) proposals, which, at the moment of writing, are in Phase 1.