MatMul-in-Metal

The first six kernels are a Metal port from https://siboehm.com/articles/22/CUDA-MMM. Currently, kernel v6 achieves 1550GFLOPS on the m1 pro GPU and is memory bound.

Next Steps

I'm currently working on the final kernel when I find the time. The current idea is to exploit the SIMD-GROUP atomics (6.9.2 of https://developer.apple.com/metal/Metal-Shading-Language-Specification.pdf) to achieve SIMD-GROUP level parallelism.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
matmul_in_metal.xcodeproj		matmul_in_metal.xcodeproj
matmul_in_metal		matmul_in_metal
metal-cpp		metal-cpp
.gitignore		.gitignore
README.md		README.md
test_kernel.metal		test_kernel.metal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MatMul-in-Metal

Next Steps

About

Uh oh!

Releases

Packages

Languages

tomludbrook10/MatMul-in-Metal

Folders and files

Latest commit

History

Repository files navigation

MatMul-in-Metal

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages