Skip to content

Commit ffda6a8

Browse files
Testable Models for SIMD Intrinsics (#423)
Solution to challenge 15, resolves #173. This PR provides testable models for `core::arch` intrinsics, including abstractions to streamline the process of implementing these intrinsics and their tests. Currently there are 384 x86 intrinsics modelled, and 181 aarch64 intrinsics modelled. The methodology for writing the models is decribed in `testable-simd-models/README.md`. First, we model the SIMD types as bitvectors that can be converted to and from arrays of machine integers. Then, we model the raw operations on these types as functions over bitvectors, while keeping as much code as possible unchanged from the Rust code in `rust-lang/stdarch/crates/core_arch`. Finally, we write tests (using a generic macro) to compare the behaviour of our models with the corresponding intrinsic implementations in Rust core. Interestingly, in the process of modeling these intrinsivcs, we found bugs in the implementation of two intrinsics, `_mm256_bsrli_epi128` and `_mm512_bsrli_epi128`. These bugs were [fixed by our PR](rust-lang/stdarch#1823) in the 2025-06-30 version of the library. In a small way, this shows off the impact of writing testable models of the SIMD intrinsics. The model of intrinsics defined here is also used as the basis of formal proofs of Rust programs that use intrinsics. In particular, the `libcrux` cryptographic library uses these models in its [proofs of the post-quantum cryptographic algorithms](https://github.com/cryspen/libcrux/tree/main/fstar-helpers/core-models). As next steps, we intend to extend these testable models to a larger subset of core (beyond SIMD) and then translate these Rust models to models in F*, Rocq, and Lean, to enable proofs using our models in these backends. This work is being done as part of the [hax project](https://github.com/cryspen/hax). The work in this PR was primarily done by Aniket Mishra, under the supervision of Karthikeyan Bhargavan and Maxime Buyse at Cryspen. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses. --------- Co-authored-by: maximebuyse <45398004+maximebuyse@users.noreply.github.com> Co-authored-by: Maxime Buyse <maxime@cryspen.com>
1 parent af09d6d commit ffda6a8

34 files changed

+11281
-0
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# This workflow runs the tests for testable simd models.
2+
3+
name: Testable simd models
4+
5+
on:
6+
workflow_dispatch:
7+
merge_group:
8+
pull_request:
9+
branches: [ main ]
10+
push:
11+
paths:
12+
- '.github/workflows/testable-simd-models.yml'
13+
- 'testable-simd-models/**'
14+
15+
defaults:
16+
run:
17+
shell: bash
18+
19+
jobs:
20+
testable-simd-models:
21+
name: Test testable simd models
22+
runs-on: ubuntu-latest
23+
24+
steps:
25+
- name: Checkout Repository
26+
uses: actions/checkout@v4
27+
28+
- name: Run tests
29+
working-directory: testable-simd-models
30+
run: cargo test -- --test-threads=1 --nocapture
31+

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,4 @@ goto-transcoder
5656
# already existing elements were commented out
5757

5858
#/target
59+
testable-simd-models/target

testable-simd-models/Cargo.toml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
[package]
2+
name = "testable-simd-models"
3+
version = "0.0.2"
4+
authors = ["Cryspen"]
5+
license = "Apache-2.0"
6+
homepage = "https://github.com/cryspen/verify-rust-std/testable-simd-models"
7+
edition = "2021"
8+
repository = "https://github.com/cryspen/verify-rust-std/testable-simd-models"
9+
readme = "README.md"
10+
11+
[dependencies]
12+
rand = "0.9"
13+
pastey = "0.1.0"
14+
15+
[lints.rust]
16+
unexpected_cfgs = { level = "warn" }

testable-simd-models/README.md

Lines changed: 319 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,319 @@
1+
# testable-simd-models
2+
3+
This crate contains executable, independently testable specifications
4+
for the SIMD intrinsics provided by the `core::arch` library in Rust.
5+
The structure of this crate is based on [rust-lang/stdarch/crates/core_arch](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch).
6+
7+
## Code Structure
8+
Within the `core_arch` folder in this crate, there is a different
9+
folder for each architecture for which we have written models.
10+
In particular, it contains folders for `x86` and `arm_shared`.
11+
Each such folder has 2 sub-folders: `models` and `tests`.
12+
13+
The `models` folder contains the models of the intrinsics, with
14+
different files for different target features (e.g. `sse2`, `avx2`
15+
etc.). The code in this folder is written using the various
16+
abstractions implemented in `abstractions`, especially those in
17+
`abstractions::simd`. These models are meant to closely
18+
resemble their implementations within the Rust core itself.
19+
20+
The `tests` folder contains the tests of these models, and is
21+
structured the same way as `models`. Each file additionally includes
22+
the definition of a macro that makes writing these tests easier. The
23+
tests work by testing the models against the intrinsics in the Rust
24+
core, trying out random inputs (generally 1000), and comparing their
25+
outputs.
26+
27+
All tests can be run by executing `cargo test`, and we expect this to be
28+
run as part of CI.
29+
30+
## Modeling a SIMD Intrinsic
31+
32+
There are three kinds of SIMD intrinsics in `core::arch`.
33+
34+
The first kind are builtin Rust compiler intrinsics, some of which are
35+
in the [`intrinsics/simd.rs` file](https://github.com/model-checking/verify-rust-std/blob/main/library/core/src/intrinsics/simd.rs)
36+
in the `core` crate, and others are in the [`simd.rs` file of `core_arch`](https://github.com/model-checking/verify-rust-std/blob/main/library/stdarch/crates/core_arch/src/simd.rs).
37+
These builtin intrinsics define generic SIMD operations that the Rust compiler knows how to implement on each platform.
38+
39+
The second kind are `extern` intrinsics that are links to definitions in LLVM.
40+
See, for example, [this list](https://github.com/rust-lang/stdarch/blob/master/crates/core_arch/src/x86/avx2.rs#L3596C8-L3596C14)
41+
of `extern` intrinsics used in the Intel x86 AVX2 library.
42+
These extern intrinsics are typically platform-specific functions that map to low-level instructions.
43+
44+
The third kind are `defined` intrinsics that are given proper definitions in Rust, and their code may
45+
depend on the builtin intrinsics or the extern intrinsics. These defined intrinsics represent higher-level
46+
operations that are wrappers around one or more assembly instructions.
47+
48+
### Modeling builtin intrinsics manually
49+
50+
We model all three kinds of intrinsics, but in slightly different
51+
ways. For the builtin intrinsics, we can write implementations once
52+
and for all, and to this end, we use a library within the
53+
`abstractions/simd.rs` file, where we copy the signatures of the
54+
intrinsics from Rust but give them our own implementation. In
55+
particular, we model each SIMD vector as an array of scalars, and
56+
define each generic operation as functions over such arrays. This can
57+
be seen as a reference implementation of the builtin intrinsics.
58+
59+
Hence, for example, the SIMD add intrinsic `simd_add` is modeled as follows,
60+
it takes two arrays of machine integers and adds them pointwise using a
61+
`wrapping_add` operation:
62+
63+
```rust
64+
pub fn simd_add<const N: u64, T: MachineInteger + Copy>(
65+
x: FunArray<N, T>,
66+
y: FunArray<N, T>,
67+
) -> FunArray<N, T> {
68+
FunArray::from_fn(|i| (x[i].wrapping_add(y[i])))
69+
}
70+
```
71+
72+
Notably, we model a strongly typed version of `simd_add`, in contrast to the compiler
73+
intrinsic, which is too generic and unimplementable in safe Rust:
74+
75+
```rust
76+
/// Adds two simd vectors elementwise.
77+
///
78+
/// `T` must be a vector of integers or floats.
79+
#[rustc_intrinsic]
80+
#[rustc_nounwind]
81+
pub unsafe fn simd_add<T>(x: T, y: T) -> T;
82+
```
83+
84+
The main rules for writing these models are that they should be simple and self-contained,
85+
relying only on the libraries in `abstractions`, on builtin Rust language features, or
86+
other testable models. In particular, they should not themselves directly call Rust libraries
87+
or external crates, without going through the abstractions API.
88+
89+
90+
### Modeling extern intrinsics manually
91+
92+
For each file in `core::arch`, we split the code into extern
93+
intrinsics that must be modeled by hand and defined intrinsics whose
94+
models can be derived semi-automatically. The extern intrinsics are
95+
placed in a module suffixed with `_handwritten`. Hence, for example,
96+
the extern intrinsics used in `avx2.rs` can be found in `avx2_handwritten.rs`.
97+
98+
Modeling extern intrinsics is similar to modeling the builtin ones,
99+
in that the models are written by hand and treat the SIMD vectors
100+
as arrays of machine integers. The main difference is that these intrinsics
101+
are platform-specific and so their modeling requires looking at the Intel or ARM
102+
documentation for the underlying operation.
103+
104+
For example, the extern intrinsic `phaddw` used in `avx2` corresponds to an
105+
Intel instruction called "Packed Horizontal Add" and is used in AVX2 intrinsics
106+
like `_mm256_hadd_epi16` documented [here](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_hadd_epi16&ig_expand=3667_)
107+
By inspecting the Intel documentation, we can write a Rust model for it
108+
as follows
109+
110+
```rust
111+
pub fn phaddw(a: i16x16, b: i16x16) -> i16x16 {
112+
i16x16::from_fn(|i| {
113+
if i < 4 {
114+
a[2 * i].wrapping_add(a[2 * i + 1])
115+
} else if i < 8 {
116+
b[2 * (i - 4)].wrapping_add(b[2 * (i - 4) + 1])
117+
} else if i < 12 {
118+
a[2 * (i - 4)].wrapping_add(a[2 * (i - 4) + 1])
119+
} else {
120+
b[2 * (i - 8)].wrapping_add(b[2 * (i - 8) + 1])
121+
}
122+
})
123+
}
124+
```
125+
126+
### Modeling defined intrinsics semi-automatically
127+
128+
To model a defined intrinsic, we essentially copy the Rust code of the
129+
intrinsic from `core::arch` and adapt it to use our underlying
130+
abstractions. The changes needed to the code are sometimes
131+
scriptable, and indeed most of our models were generated from a script
132+
(see the ANNEX at the bottom of this file), but some changes are still
133+
needed by hand.
134+
135+
For example, let us say the intrinsic we are modeling is
136+
`_mm256_bsrli_epi128` from the avx2 feature set.
137+
138+
1. We go to [rust-lang/stdarch/crates/core_arch/src/x86/](https://github.com/rust-lang/stdarch/tree/master/crates/core_arch/src/x86/), and find the implementation of the intrinsic in `avx2.rs`.
139+
140+
2. We see that the implementation looks like this:
141+
``` rust
142+
/// Shifts 128-bit lanes in `a` right by `imm8` bytes while shifting in zeros.
143+
///
144+
/// [Intel's documentation](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm256_bsrli_epi128)
145+
#[inline]
146+
#[target_feature(enable = "avx2")]
147+
#[cfg_attr(test, assert_instr(vpsrldq, IMM8 = 1))]
148+
#[rustc_legacy_const_generics(1)]
149+
#[stable(feature = "simd_x86", since = "1.27.0")]
150+
pub fn _mm256_bsrli_epi128<const IMM8: i32>(a: __m256i) -> __m256i {
151+
static_assert_uimm_bits!(IMM8, 8);
152+
const fn mask(shift: i32, i: u32) -> u32 {
153+
let shift = shift as u32 & 0xff;
154+
if shift > 15 || (15 - (i % 16)) < shift {
155+
0
156+
} else {
157+
32 + (i + shift)
158+
}
159+
}
160+
unsafe {
161+
let a = a.as_i8x32();
162+
let r: i8x32 = simd_shuffle!(
163+
i8x32::ZERO,
164+
a,
165+
[
166+
mask(IMM8, 0),
167+
mask(IMM8, 1),
168+
mask(IMM8, 2),
169+
mask(IMM8, 3),
170+
...
171+
mask(IMM8, 31),
172+
],
173+
);
174+
transmute(r)
175+
}
176+
}
177+
```
178+
179+
Thus, we then go to `core_arch/x86/models/avx2.rs`, and add this implementation.
180+
The only changes it requires here are that the `simd_shuffle` macro is a function in our model,
181+
the `ZERO` constant is now a function, and we discard all the function attributes.
182+
183+
The exact diff between the original and edited code for this function is:
184+
185+
```diff
186+
13,14c13,14
187+
< let r: i8x32 = simd_shuffle(
188+
< i8x32::ZERO(),
189+
---
190+
> let r: i8x32 = simd_shuffle!(
191+
> i8x32::ZERO,
192+
```
193+
194+
For other intrinsics, we sometimes need to make more changes. Since our model of the builtin intrinsics
195+
is more precise concerning the type of their arguments compared to their Rust counterparts, we
196+
sometimes need to add more type annotations in our defined models. We also remove all `unsafe` guards,
197+
since our models are always in safe Rust. Otherwise, our code for the defined intrinsics looks very
198+
similar to the upstream code in `core::arch`.
199+
200+
3. Next, we add a test for this intrinsic in `core_arch/avx2/tests/avx2.rs`. For convenience purposes, we have defined a `mk!` macro, which can be used to automatically generate
201+
tests. The test generated by the macro generates a number of random inputs (by default, 1000), and compares the output generated by the model
202+
and that generated by the intrinsic in upstream `core::arch`. A valid test of the intrinsic above looks like this.
203+
```rust
204+
mk!([100]_mm256_bsrli_epi128{<0>,<1>,<2>,<3>,...,<255>}(a: BitVec));
205+
```
206+
The macro invocation has four parts.
207+
1. `mk!([100]...`: By default, the macro tests for a thousand randomly generated inputs. If needed, this can be modified, such as here, where the `[100]` is used so that
208+
only 100 inputs are generated.
209+
2. `_mm256_bsrli_epi128`: This is the name of the intrinsic being tested, and is necessary in all cases.
210+
3. `{<0>,<1>,<2>,<3>,...,<255>}`: This part only appears when the intrinsic has a const generic argument, like the `IMM8` in this intrinsic.
211+
As the name indicates, this constant argument is supposed to be at most 8 bits wide.
212+
We can confirm this by looking at the implementation and spotting the `static_assert_uimm_bits!(IMM8, 8);`
213+
line, which asserts that constant argument is positive and fits in 8 bits. Thus, we add `{<0>,<1>,<2>,<3>,...,<255>}` to test for each possible constant
214+
value of the constant argument.
215+
4. `(a: BitVec)`: This part contains all the arguments of the intrinsic and their types.
216+
217+
This summarizes the steps needed to use the `mk!` macro to generate a test. There is a caveat: in the case that the output of an intrinsic is _not_
218+
a bit-vector (and is instead, say, an integer like `i32`), then the macro will not work, and a manual test has to be written. You can see examples in the test files.
219+
220+
221+
222+
## Contributing Models
223+
224+
To contribute new models of intrinsics, we expect the author to follow
225+
the above steps and provide comprehensive tests. It is important that
226+
the model author looks carefully at both the Intel/ARM specifications
227+
and the Rust `stdarch` implementation, because they may look quite different
228+
from each other.
229+
230+
In some cases, the Rust implementation may not be correct.
231+
Indeed, the previous implementation of `_mm256_bsrli_epi128` (and a
232+
similar intrinsic called `_mm512_bsrli_epi128`) in `stdarch` had a
233+
bug, which we found during the process of modeling and testing this
234+
intrinsic. This bug was [reported by
235+
us](https://github.com/rust-lang/stdarch/issues/1822) using a failing
236+
test case generated from the testable model and then fixed by [our
237+
PR](https://github.com/rust-lang/stdarch/pull/1823) in the 2025-06-30
238+
version of `stdarch`.
239+
240+
241+
## ANNEX: Extraction Script
242+
243+
The following Rust program is a simple script that uses the `syn` crate to process an input Rust file
244+
containing SIMD intrinsics into one suitable for the models described in this document. This code
245+
is provided as illustration; for each set of core libraries we wish to model and test, there will
246+
likely be need for a similar (or extended) script to automate the modeling process.
247+
248+
```rust
249+
use syn::*;
250+
use std::fs;
251+
use std::env;
252+
253+
fn extract_model(input_file_path: &str, output_file_path: &str) -> Result<()> {
254+
let source_code = fs::read_to_string(input_file_path).expect("unable to read file");
255+
let mut syntax_tree: File = parse_file(&source_code)?;
256+
257+
syntax_tree.items.retain(|item|
258+
match item {
259+
Item::Use(_) => false,
260+
_ => true
261+
}
262+
);
263+
264+
// Clear attributes from the file's top-level items
265+
for item in &mut syntax_tree.items {
266+
match item {
267+
Item::Const(const_item) => {
268+
const_item.attrs.retain(|attr| attr.path().is_ident("doc"));
269+
},
270+
Item::Fn(item_fn) => {
271+
item_fn.attrs.retain(|attr| attr.path().is_ident("doc"));
272+
item_fn.block.stmts.retain(|stmt|
273+
match stmt {
274+
Stmt::Item(Item::ForeignMod(_)) => false,
275+
_ => true
276+
}
277+
);
278+
for stmt in &mut item_fn.block.stmts {
279+
match stmt {
280+
Stmt::Expr(Expr::Unsafe(u), tok) => *stmt = Stmt::Expr(Expr::Block(
281+
ExprBlock {attrs : Vec::new(), label : None, block : u.block.clone()}), *tok),
282+
_ => ()
283+
}
284+
}
285+
},
286+
Item::Struct(item_struct) => {
287+
item_struct.attrs.clear();
288+
for field in &mut item_struct.fields {
289+
field.attrs.retain(|attr| attr.path().is_ident("doc"));
290+
}
291+
},
292+
Item::Enum(item_enum) => {
293+
item_enum.attrs.clear();
294+
for variant in &mut item_enum.variants {
295+
variant.attrs.retain(|attr| attr.path().is_ident("doc"));
296+
}
297+
},
298+
// Add more cases for other Item types (e.g., Item::Mod, Item::Impl, etc.)
299+
_ => {
300+
// For other item types, if they have an 'attrs' field, clear it.
301+
// This requires more specific matching or a helper trait.
302+
}
303+
}
304+
}
305+
306+
let formatted_string = prettyplease::unparse(&syntax_tree);
307+
fs::write(output_file_path, formatted_string).expect("unable to write file");
308+
309+
Ok(())
310+
}
311+
312+
fn main() -> Result<()> {
313+
let args: Vec<String> = env::args().collect();
314+
if args.len() < 3 {
315+
println!("usage: modelize <path to input Rust file> <path to output Rust file>")
316+
}
317+
extract_model(&args[1], &args[2])
318+
}
319+
```

0 commit comments

Comments
 (0)