matmul

Here are 5 public repositories matching this topic...

eduand-alvarez / CUDA_Custom_MatMul_Experiment

This project integrates a custom CUDA-based matrix multiplication kernel into a PyTorch deep learning model, leveraging GPU acceleration for matrix operations. The goal is to compare the performance of this custom kernel with PyTorch's built-in matrix multiplication and demonstrate how custom CUDA kernels can optimize compute-intensive operations.

cuda-kernels matmul

Updated Aug 26, 2024
Python

LaserBorg / circuitpython_benchmark

Star

Raspberry Pi Pico (RP2040) and Adafruit Metro M7 (NXP IMXRT10XX) benchmark

benchmark adafruit python3 mcu circuitpython float32 matmul raspberry-pi-pico adafruit-metro-m7

Updated Jan 12, 2024
Python

xone4 / optimized-Mat-Mul-cuda-code

Star

The provided code is a Python script that uses the CuPy library to perform optimized GPU operations, specifically matrix multiplication. The script includes a custom CUDA kernel that is optimized for performance and energy consumption. The kernel uses half-precision floating-point numbers (float16) for improved performance and warp utilization.

optimization cuda-kernels matmul

Updated Oct 7, 2024
Python

akifejaz / HwVerification

Star

This repo contains the python scripts for MatMul's all modules testing.

testing hardware matmul

Updated Apr 28, 2023
Python

akifejaz / matmul-testbench

Star

This is the simple script that generate matrixes of size 4 by 4, for testing Matmul.

python testbench matmul

Updated Nov 18, 2022
Python

Improve this page

Add a description, image, and links to the matmul topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the matmul topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

matmul

Here are 5 public repositories matching this topic...

eduand-alvarez / CUDA_Custom_MatMul_Experiment

LaserBorg / circuitpython_benchmark

xone4 / optimized-Mat-Mul-cuda-code

akifejaz / HwVerification

akifejaz / matmul-testbench

Improve this page

Add this topic to your repo