Collection of kernels written in Triton language (didn't seem to be a lot till now). Welcoming contribution!
Awesome resources from cuda-mode, their guide to Triton
Triton Kernel collection by cuda-mode
attorch subset of PyTorch's nn module
scattermoe: Sparse Mixture-of-Experts
Liger Kernel: Efficient Triton Kernels for LLM Training
FlagAttention, memory-efficient attention kernels
Activation functions by dogukantai
Sparse Toolkit: Block-sparse matrix multiplication (paper)
GemLite: Fused low-bit matrix multiplication