ACP: Add carryless product for unsigned integers

# Proposal

## Problem statement

[Carryless product](https://en.wikipedia.org/wiki/Carry-less_product) (also known as _XOR multiplication_) is backed by hardware instructions on many platforms, but there's no way for users to write code which uses these instructions without accessing them through hardware intrinsics. This results in users having to write more platform specific code or rely on less optimized, scalar expressions of the operation for portable applications.

## Motivating examples or use cases

Carryless product can be used to accelerate checksums (like CRC/cyclic redundancy checks) and aid in implementing some cryptographic algorithms. Since many left shifts and XORs are executed in parallel, there are also applications of it speeding up bit manipulation operations. In the links section are some example applications: one showing the acceleration of JSON parsing by finding pairs of quotation marks using carryless multiplication to perform a prefix XOR, another showing accelerated Morton code/Z-order curve calculations.

This operation can also be referred to as "polynomial multiplication" since it's equivalent to the multiplication of two polynomials in [GF(2)](https://en.wikipedia.org/wiki/GF(2)), the finite field with 2 elements (polynomials can be represented as bitstrings).

## Solution sketch

[Intrinsic support][llvm_clmul] was recently added in LLVM 22 and has the following semantics according to the LangRef:
> The ‘llvm.clmul’ intrinsic computes carry-less multiply of its arguments, which is the result of applying the standard multiplication algorithm, where all of the additions are replaced with XORs, and returns the low-bits. The vector variants operate lane-wise.

A `widening` variant is also proposed which returns a tuple of `(low_bits, high_bits)`.

```rust
impl uN {
    /// Calculates the carryless product of `self` and `rhs`, returning the low-order bits.
    ///
    /// Instead of addition, XOR (exclusive-or) is applied as the reduction operation.
    ///
    /// If the high-order bits are needed, see [`Self::widening_carryless_mul`].
    /// ```
    ///     let x = 0b1101;
    ///     let y = 0b0110;
    ///     let z = 0b1111;
    ///     assert_eq!(0b00101110, x.carryless_mul(y));
    ///     assert_eq!(0b01010101, z.carryless_mul(z));
    /// ```
    pub const fn carryless_mul(self, rhs: Self) -> Self;

    /// Calculates the carryless product of `self` and `rhs`, returning a tuple of low-order
    /// bits and high-order bits in that order.
    ///
    /// Instead of addition, XOR (exclusive-or) is applied as the reduction operation.
    ///
    /// If only the low-order bits are needed, consider using [`Self::carryless_mul`].
    /// ```
    ///     let x = 0b10100010;
    ///     let y = 0b10010110;
    ///     let z = 0b11111111;
    ///     let (lo, hi) = x.widening_carryless_mul(y);
    ///     assert_eq!(0b11101100, lo);
    ///     assert_eq!(0b01011000, hi);
    ///     assert_eq!((0b01010101, 0b01010101), z.widening_carryless_mul(z));
    /// ```
    pub const fn widening_carryless_mul(self, rhs: Self) -> (Self, Self); 
}
```

## Alternatives

1. Do nothing, rely on the compiler recognizing scalar implementations of these patterns or require users to use intrinsics.
2. Wait until LLVM 23 for implementing this. More widespread use of this intrinsic in LLVM optimizations and architecture backends won't reach Rust until LLVM 23 since the intrinsic landed not long before the version branch.
3. Should the widening variant instead return the next integer size larger, and not be implemented for `u128`? So `u8`->`u16`, `u16`->`u32`, `u32`->`u64`, but nothing for `u128`.
## Links and related work

https://llvm.org/docs/LangRef.html#llvm-clmul-intrinsic
[Jan Schultke, active C++29 proposal - Carry-less product: `std::clmul`](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p3642r3.html)

[llvm_clmul]: https://llvm.org/docs/LangRef.html#llvm-clmul-intrinsic

#### Hardware instructions

x86: [`pclmulqdq`](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=clmul), [wikipedia:CLMUL instruction set](https://en.wikipedia.org/wiki/CLMUL_instruction_set)
Arm NEON: [`pmull`/`pmull2`](https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#polynomial-multiply) for low and high bits, respectively
RISC-V: [`clmul`/`clmulh`/`clmulr`(reversed)](https://five-embeddev.com/riscv-bitmanip/1.0.0/bitmanip.html#zbc), the Zbc extension
SPARC: [`XMULX`/`XMULXHI`](https://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/140521-ua2011-d096-p-ext-2306580.pdf), "XOR multiply"
POWER: `vpmsum`, "vector polynomial multiply sum"

#### Applications/articles

[Geoff Langdale - Finding quote pairs with carry-less multiply](https://branchfree.org/2019/03/06/code-fragment-finding-quote-pairs-with-carry-less-multiply-pclmulqdq/)
[Wunkolo - pclmulqdq Tricks](https://wunkolo.github.io/post/2020/05/pclmulqdq-tricks/)
[Intel white paper - Fast CRC Computation for Generic Polynomials Using PCLMULQDQ Instruction](https://web.archive.org/web/20131224125630/https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/fast-crc-computation-generic-polynomials-pclmulqdq-paper.pdf)

## What happens now?

This issue contains an API change proposal (or ACP) and is part of the libs-api team [feature lifecycle]. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.

[feature lifecycle]: https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html

## Possible responses

The libs team may respond in various different ways. First, the team will consider the *problem* (this doesn't require any concrete solution or alternatives to have been proposed):

- We think this problem seems worth solving, and the standard library might be the right place to solve it.
- We think that this probably doesn't belong in the standard library.

Second, if there's a concrete solution:

- We think this specific solution looks roughly right, approved, you or someone else should implement this. (Further review will still happen on the subsequent implementation PR.)
- We're not sure this is the right solution, and the alternatives or other materials don't give us enough information to be sure about that. Here are some questions we have that aren't answered, or rough ideas about alternatives we'd want to see discussed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ACP: Add carryless product for unsigned integers #738

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

Hardware instructions

Applications/articles

What happens now?

Possible responses

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ACP: Add carryless product for unsigned integers #738

Description

Proposal

Problem statement

Motivating examples or use cases

Solution sketch

Alternatives

Links and related work

Hardware instructions

Applications/articles

What happens now?

Possible responses

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions