tensor-cores

Star

Here are 11 public repositories matching this topic...

xlite-dev / ffpa-attn

Star

🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.

cuda attention sdpa mla mlsys tensor-cores flash-attention deepseek deepseek-v3 deepseek-r1 fused-mla flash-mla

Updated Nov 18, 2025
Cuda

xlite-dev / HGEMM

Star

⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.

cuda tensor-cores hgemm

Updated May 10, 2025
Cuda

tgautam03 / tGeMM

Star

General Matrix Multiplication using NVIDIA Tensor Cores

matrix-multiplication cuda-kernels gpu-computing nvidia-cuda nvidia-gpu gpu-programming sgemm cuda-programming tensor-cores nvidia-tensor-cores

Updated Jan 25, 2025
Cuda

etasnadi / VulkanCooperativeMatrixAttention

Star

Vulkan & GLSL implementation of FlashAttention-2

vulkan glsl artificial-intelligence gpu-acceleration attention gpu-computing deel-learning tensor-cores large-language-models llm flash-attention flash-attention-2

Updated Jan 19, 2025
C++

Cre4T3Tiv3 / jetson-orin-matmul-analysis

Sponsor

Star

Scientific CUDA benchmarking framework: 4 implementations x 3 power modes x 5 matrix sizes on Jetson Orin Nano. 1,282 GFLOPS peak, 90% performance @ 88% power (25W mode), 99.5% accuracy validation, edge AI deployment guide.

Updated Oct 14, 2025
Python

LDRyan0 / Correlator-Bench

Star

A benchmarking framework for correlators of FX telescope arrays

cpp cuda radio-astronomy astronomy-instrumentation tensor-cores

Updated Oct 20, 2023
Cuda

NeuralAditya / Neural_Network_C

Star

Neural Network C is an advanced neural network implementation in pure C, optimized for high performance on CPUs and NVIDIA GPUs.

Updated Mar 29, 2025
C

Umer-Farooq-CS / MNIST-Classification

Star

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

benchmarking deep-learning parallel-computing cuda mnist neural-networks high-performance-computing gpu-acceleration profiling shared-memory openacc performance-optimization c-cpp nsight tensor-cores cuda-streams pinned-memory

Updated Sep 12, 2025
Cuda

fsudjatmiko / tsurutune-app

Star

TsuruTune is a comprehensive deep learning model optimization tool designed specifically for NVIDIA Jetson platforms and edge devices.. It leverages Tensor Core acceleration and memory bandwidth alignment to achieve optimal performance for deep learning inference on edge devices.

deep-learning optimization-methods tensor-cores

Updated Nov 13, 2025
Python

aye-shadow / neural-network-acceleration

Star

cuda gpu-acceleration tensor-cores

Updated Apr 20, 2025
Cuda

ZrobMiloudaa / jetson-orin-matmul-analysis

Star

🔍 Analyze CUDA matrix multiplication performance and power consumption on NVIDIA Jetson Orin Nano across multiple implementations and settings.

machine-learning robotics cuda cublas matrix-multiplication high-performance-computing gpu-computing performance-optimization autonomous-systems edge-computing nvidia-jetson embeded-systems tensor-cores ml-deployment jetson-orin-nano gpu-benchmarking power-efficiency-benchmark cuda-optimization

Updated Nov 22, 2025
Python

Improve this page

Add a description, image, and links to the tensor-cores topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensor-cores topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tensor-cores

Here are 11 public repositories matching this topic...

xlite-dev / ffpa-attn

xlite-dev / HGEMM

tgautam03 / tGeMM

etasnadi / VulkanCooperativeMatrixAttention

Cre4T3Tiv3 / jetson-orin-matmul-analysis

LDRyan0 / Correlator-Bench

NeuralAditya / Neural_Network_C

Umer-Farooq-CS / MNIST-Classification

fsudjatmiko / tsurutune-app

aye-shadow / neural-network-acceleration

ZrobMiloudaa / jetson-orin-matmul-analysis

Improve this page

Add this topic to your repo