Description
This is intended to be a tracking issue for implementing all vendor intrinsics in this repository.
This issue is also intended to be a guide for documenting the process of adding new vendor intrinsics to this crate.
If you decide to implement a set of vendor intrinsics, please check the list below to make sure somebody else isn't already working on them. If it's not checked off or has a name next to it, feel free to comment that you'd like to implement it!
At a high level, each vendor intrinsic should correspond to a single exported Rust function with an appropriate target_feature
attribute. Here's an example for _mm_adds_epi16
:
/// Add packed 16-bit integers in `a` and `b` using saturation.
#[inline]
#[target_feature(enable = "sse2")]
#[cfg_attr(test, assert_instr(paddsw))]
pub unsafe fn _mm_adds_epi16(a: __m128i, b: __m128i) -> __m128i {
unsafe { paddsw(a, b) }
}
Let's break this down:
- The
#[inline]
is added because vendor intrinsic functions generally should always be inlined because the intent of a vendor intrinsic is to correspond to a single particular CPU instruction. A vendor intrinsic that is compiled into an actual function call could be quite disastrous for performance. - The
#[target_feature(enable = "sse2")]
attribute intructs the compiler to generate code with thesse2
target feature enabled, regardless of the target platform. That is, even if you're compiling for a platform that doesn't supportsse2
, the compiler will still generate code for_mm_adds_epi16
as ifsse2
support existed. Without this attribute, the compiler might not generate the intended CPU instruction. - The
#[cfg_attr(test, assert_instr(paddsw))]
attribute indicates that when we're testing the crate we'll assert that thepaddsw
instruction is generated inside this function, ensuring that the SIMD intrinsic truly is an intrinsic for the instruction! - The types of the vectors given to the intrinsic should match exactly the types as provided in the vendor interface. (with things like
int64_t
translated toi64
in Rust) - The implementation of the vendor intrinsic is generally very simple. Remember, the goal is to compile a call to
_mm_adds_epi16
down to a single particular CPU instruction. As such, the implementation typically defers to a compiler intrinsic (in this case,paddsw
) when one is available. More on this below as well. - The intrinsic itself is
unsafe
due to the usage of#[target_feature]
Once a function has been added, you should also add at least one test for basic functionality. Here's an example for _mm_adds_epi16
:
#[simd_test = "sse2"]
unsafe fn test_mm_adds_epi16() {
let a = _mm_set_epi16(0, 1, 2, 3, 4, 5, 6, 7);
let b = _mm_set_epi16(8, 9, 10, 11, 12, 13, 14, 15);
let r = _mm_adds_epi16(a, b);
let e = _mm_set_epi16(8, 10, 12, 14, 16, 18, 20, 22);
assert_eq_m128i(r, e);
}
Note that #[simd_test]
is the same as #[test]
, it's just a custom macro to enable the target feature in the test and generate a wrapper for ensuring the feature is available on the local cpu as well.
Finally, once that's done, send a PR!
Writing the implementation
An implementation of an intrinsic (so far) generally has one of three shapes:
- The vendor intrinsic does not have any corresponding compiler intrinsic, so you must write the implementation in such a way that the compiler will recognize it and produce the desired codegen. For example, the
_mm_add_epi16
intrinsic (note the missings
inadd
) is implemented viasimd_add(a, b)
, which compiles down to LLVM's cross platform SIMD vector API. - The vendor intrinsic does have a corresponding compiler intrinsic, so you must write an
extern
block to bring that intrinsic into scope and then call it. The example above (_mm_adds_epi16
) uses this approach. - The vendor intrinsic has a parameter that must be a constant value when given to the CPU instruction, where that constant is often a parameter that impacts the operation of the intrinsic. This means the implementation of the vendor intrinsic must guarantee that a particular parameter be a constant. This is tricky because Rust doesn't (yet) have a stable way of doing this, so we have to do it ourselves. How you do it can vary, but one particularly gnarly example is
_mm_cmpestri
(make sure to look at theconstify_imm8!
macro).
References
All intel intrinsics can be found here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=5236
The compiler intrinsics available to us through LLVM can be found here: https://gist.github.com/anonymous/a25d3e3b4c14ee68d63bd1dcb0e1223c
The Intel vendor intrinsic API can be found here: https://gist.github.com/anonymous/25d752fda8521d29699a826b980218fc
The Clang header files for vendor intrinsics can also be incredibly useful. When in doubt, Do What Clang Does:
https://github.com/llvm-mirror/clang/tree/master/lib/Headers
TODO
["MMX"]
-
_mm_srli_pi16
-
_mm_srl_pi16
-
_mm_mullo_pi16
-
_mm_slli_si64
-
_mm_mulhi_pi16
-
_mm_srai_pi16
-
_mm_srli_si64
-
_mm_and_si64
-
_mm_cvtsi32_si64
-
_mm_cvtm64_si64
-
_mm_andnot_si64
-
_mm_packs_pu16
-
_mm_madd_pi16
-
_mm_cvtsi64_m64
-
_mm_cmpeq_pi16
-
_mm_sra_pi32
-
_mm_cvtsi64_si32
-
_mm_cmpeq_pi8
-
_mm_srai_pi32
-
_mm_sll_pi16
-
_mm_srli_pi32
-
_mm_slli_pi16
-
_mm_srl_si64
-
_mm_empty
-
_mm_srl_pi32
-
_mm_slli_pi32
-
_mm_or_si64
-
_mm_sll_si64
-
_mm_sra_pi16
-
_mm_sll_pi32
-
_mm_xor_si64
-
_mm_cmpeq_pi32