Closed
Description
Does the joint matrix support the similar operation ?
bmma_sync
Waits until all warp lanes have executed bmma_sync, and then performs the warp-synchronous bit matrix multiply-accumulate operation D = (A op B) + C, where op consists of a logical operation bmmaBitOp followed by the accumulation defined by bmmaAccumulateOp. The available operations are:
bmmaBitOpXOR, a 128-bit XOR of a row in matrix_a with the 128-bit column of matrix_b
bmmaBitOpAND, a 128-bit AND of a row in matrix_a with the 128-bit column of matrix_b, available on devices with compute capability 8.0 and higher.
The accumulate op is always bmmaAccumulateOpPOPC which counts the number of set bits.