FP64 equivalent GEMM by the Ozaki scheme with Int8 Tensor Cores
-
Updated
Dec 2, 2025 - Cuda
FP64 equivalent GEMM by the Ozaki scheme with Int8 Tensor Cores
An extension library of WMMA API (Tensor Core API)
Implementation of FlashAttention-2 for Nvidia Tesla V100
Fast SGEMM emulation on Tensor Cores
Fast Kernel SVM on TensorCore enabled GPU
Artifact for SC21: APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores.
Add a description, image, and links to the tensorcore topic page so that developers can more easily learn about it.
To associate your repository with the tensorcore topic, visit your repo's landing page and select "manage topics."