Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
-
Updated
Mar 19, 2026 - Python
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.
AMD CDNA/RDNA (MI300 gfx942 / MI350 gfx950 / RDNA4 gfx1201) GPU kernel optimization knowledge base, packaged as a Claude Code skill. 7,400+ merged-PR references + 53 ISA-grounded synthesis pages. Inspired by MIT Han Lab's KernelWiki.
Learn Triton by building FlashAttention from scratch — V2 kernels, persistent threads, mask DSL, profiling toolkit, bilingual docs
Automatic Triton kernel generation and optimization for Intel GPU, powered by Claude Code.
Noeris — autonomous kernel fusion discovery + Triton autotuning for LLM kernels and Gemma layer deeper fusion (A100/H100 wins).
Triton FlashAttention kernel with PyTorch autograd, correctness tests, and GPU benchmarks.
Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).
Optimize PyTorch GPU kernels by autonomously profiling, extracting, and improving Triton or CUDA C++ code for better performance and efficiency.
Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."