Question about migrating CUDA bmma_sync

Does the joint matrix support the similar operation ?


bmma_sync
Waits until all warp lanes have executed bmma_sync, and then performs the warp-synchronous bit matrix multiply-accumulate operation D = (A op B) + C, where op consists of a logical operation bmmaBitOp followed by the accumulation defined by bmmaAccumulateOp. The available operations are:

bmmaBitOpXOR, a 128-bit XOR of a row in matrix_a with the 128-bit column of matrix_b

bmmaBitOpAND, a 128-bit AND of a row in matrix_a with the 128-bit column of matrix_b, available on devices with compute capability 8.0 and higher.

The accumulate op is always bmmaAccumulateOpPOPC which counts the number of set bits.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about migrating CUDA bmma_sync #12325

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about migrating CUDA bmma_sync #12325

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions