Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for masked loads & stores #374

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions crates/core_simd/src/intrinsics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,15 @@ extern "platform-intrinsic" {
/// like gather, but more spicy, as it writes instead of reads
pub(crate) fn simd_scatter<T, U, V>(val: T, ptr: U, mask: V);

/// like a loop of reads offset from the same pointer
/// val: vector of values to select if a lane is masked
/// ptr: vector of pointers to read from
/// mask: a "wide" mask of integers, selects as if simd_select(mask, read(ptr), val)
/// note, the LLVM intrinsic accepts a mask vector of `<N x i1>`
pub(crate) fn simd_masked_load<T, U, V>(val: T, ptr: U, mask: V) -> T;
/// like masked_load, but more spicy, as it writes instead of reads
pub(crate) fn simd_masked_store<T, U, V>(val: T, ptr: U, mask: V);

// {s,u}add.sat
pub(crate) fn simd_saturating_add<T>(x: T, y: T) -> T;

Expand Down
59 changes: 59 additions & 0 deletions crates/core_simd/src/vector.rs
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,42 @@ where
unsafe { self.store(slice.as_mut_ptr().cast()) }
}

#[must_use]
#[inline]
pub fn masked_load_or(slice: &[T], or: Self) -> Self {
Self::masked_load_select(slice, Mask::splat(true), or)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this function does anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it's masking the slice length, but I'm not sure I would even call this a masked load. Also, I'm not sure this is the best way to load from slices on most architectures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's supposed to enable us to write extremely nice loops like this:

let mut accum = Simd::<u8, N>::default();
for i in (0..data.len()).step_by(N) {
    accum ^= Simd::masked_load_or(&data[i..], Simd::default());
}

No epilogues or scalar fallbacks needed. This could be even shorter since SimdElement implies Default?

But it's only a reality on AVX-512 for now on account of the codegen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would at least rename it--it might be implemented via masked loads, but API-wise I would consider it something else entirely. This is perhaps Simd::from_slice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the rename -- I don't like that this function seems more discoverable than from_slice. Users should prefer from_slice if it fits their use-case. This isn't a problem with gather/scatter because that's a specific name with a more complex signature, it's hard to use it by accident.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from_slice already has a branch on the length anyway, what I'm suggesting that perhaps from_slice should instead do a masked load for len < N:

assert!(
slice.len() >= Self::LEN,
"slice length must be at least the number of elements"
);


#[must_use]
#[inline]
pub fn masked_load_select(slice: &[T], enable: Mask<isize, N>, or: Self) -> Self {
farnoy marked this conversation as resolved.
Show resolved Hide resolved
let ptr = slice.as_ptr();
let idxs = Simd::<usize, N>::from_slice(&[
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. The rotate_lanes_left code does a clever thing with a constexpr and maybe that should be preferred?

But ideally, I'd like to have something like Simd<T, N>::lane_indices() -> Simd<T, N> (generating 0 through N elements) built-in as it's a useful function to have in user code.

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
]);
let enable: Mask<isize, N> = enable & idxs.simd_lt(Simd::splat(slice.len()));
unsafe { Self::masked_load_select_ptr(ptr, enable, or) }
}

#[must_use]
#[inline]
pub unsafe fn masked_load_select_unchecked(
slice: &[T],
enable: Mask<isize, N>,
or: Self,
) -> Self {
let ptr = slice.as_ptr();
unsafe { Self::masked_load_select_ptr(ptr, enable, or) }
}

#[must_use]
#[inline]
pub unsafe fn masked_load_select_ptr(ptr: *const T, enable: Mask<isize, N>, or: Self) -> Self {
unsafe { intrinsics::simd_masked_load(or, ptr, enable.to_int()) }
}

/// Reads from potentially discontiguous indices in `slice` to construct a SIMD vector.
/// If an index is out-of-bounds, the element is instead selected from the `or` vector.
///
Expand Down Expand Up @@ -489,6 +525,29 @@ where
unsafe { intrinsics::simd_gather(or, source, enable.to_int()) }
}

#[inline]
pub fn masked_store(self, slice: &mut [T], enable: Mask<isize, N>) {
let ptr = slice.as_mut_ptr();
let idxs = Simd::<usize, N>::from_slice(&[
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
]);
let enable: Mask<isize, N> = enable & idxs.simd_lt(Simd::splat(slice.len()));
unsafe { self.masked_store_ptr(ptr, enable) }
}

#[inline]
pub unsafe fn masked_store_unchecked(self, slice: &mut [T], enable: Mask<isize, N>) {
let ptr = slice.as_mut_ptr();
unsafe { self.masked_store_ptr(ptr, enable) }
}

#[inline]
pub unsafe fn masked_store_ptr(self, ptr: *mut T, enable: Mask<isize, N>) {
unsafe { intrinsics::simd_masked_store(self, ptr, enable.to_int()) }
}

/// Writes the values in a SIMD vector to potentially discontiguous indices in `slice`.
/// If an index is out-of-bounds, the write is suppressed without panicking.
/// If two elements in the scattered vector would write to the same index
Expand Down
Loading