Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify bitmasks #375

Merged
merged 3 commits into from
Nov 19, 2023
Merged

Simplify bitmasks #375

merged 3 commits into from
Nov 19, 2023

Conversation

calebzulawski
Copy link
Member

@calebzulawski calebzulawski commented Nov 17, 2023

I was able to remove the ToBitMask and ToBitMaskArray traits and remove the complex trait bounds by making these two changes:

  • to_bitmask always returns u64. For larger vectors this truncates to 64 lanes.
  • to_bitmask_array is now to_bitmask_vector and returns Simd<T, N> for Mask<T, N>. This of course wastes bytes (in memory), but simplifies the function signature. This also makes it easier to do parallel operations on the entire byte array, which is a common use-case for bitmasks anyway.

Copy link
Member

@programmerjake programmerjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to remove the ToBitMask and ToBitMaskArray traits and remove the complex trait bounds by making these two changes:

  • to_bitmask always returns u64. For larger vectors this truncates to 64 lanes.

This is likely less efficient, since LLVM has to figure out that most of the 64-element vector is zero and thereby it can shrink the types...seems iffy.

  • to_bitmask_array is now to_bitmask_vector and returns Simd<T, N> for Mask<T, N>. This of course wastes bytes (in memory), but simplifies the function signature.

I don't think removing to_bitmask_array is a good idea, since it's much easier to use when you need the bitmask to directly inspect/modify. Also, it potentially wastes much less memory.

I'm ok with also having to_bitmask_vector...

This also makes it easier to do parallel operations on the entire byte array, which is a common use-case for bitmasks anyway.

seems less likely, since most the parallel operations you'd want are already covered by Mask...

crates/core_simd/src/swizzle.rs Outdated Show resolved Hide resolved
@thomcc
Copy link
Member

thomcc commented Nov 17, 2023

I haven't read through the code, but from the description, I'm very much in favor of this simplification!

This is likely less efficient, since LLVM has to figure out that most of the 64-element vector is zero and thereby it can shrink the types...seems iffy.

I think LLVM is generally very good at this sort of range optimization, FWIW. I may be misunderstanding though, and it might be worth checking?

@calebzulawski
Copy link
Member Author

LLVM is generally good with zero-elements in vectors, but if we like this API I intend on modifying the simd_bitmask intrinsic to instead allow zero-extending the integer result. Padding the vector is just a workaround until that's done.

Regarding having access to the bytes, you can always call to_ne_bytes on the resulting vector, and copy it into an array to save memory. Maybe cumbersome for that specific usage but saves a lot on the API. ToBitMaskArray, with its associated type and bounds, is a bit much for a single function IMO, especially since most people will typically want the u64. This also isn't precluding an array implementation in the future, especially if const generics get better.

@programmerjake
Copy link
Member

LLVM is generally good with zero-elements in vectors,

not in this case:
https://llvm.godbolt.org/z/b7fY99beo

@calebzulawski
Copy link
Member Author

If we like the API I can update the intrinsic,

LLVM is generally good with zero-elements in vectors,

not in this case: https://llvm.godbolt.org/z/b7fY99beo

Ugh, I've definitely seen it do better than that. I added a new workaround that doesn't require padding the vectors.

Copy link
Member

@programmerjake programmerjake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other than that, looks good enough for me.

@calebzulawski calebzulawski merged commit 7e5c03a into master Nov 19, 2023
66 checks passed
@calebzulawski calebzulawski deleted the bitmask branch November 19, 2023 03:05
@programmerjake
Copy link
Member

GitHub apparently never posted my review comment that from/to_bitmask_vector needs more docs clarifying that all the bits are packed tightly in the first N bits, otherwise people might think that it uses one bit from each byte, like [bool] does.

@calebzulawski
Copy link
Member Author

Oh, yeah, I'll add that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants