High-Performance Triton Ops: RMSNorm+RoPE Fusion, Gated MLP Fusion & FP8 Quantized GEMM for Transformers | 高性能 Triton 算子库:RMSNorm+RoPE 融合、Gated MLP 融合、FP8 量化 GEMM,专为 Transformer 优化
-
Updated
Mar 9, 2026 - Python
High-Performance Triton Ops: RMSNorm+RoPE Fusion, Gated MLP Fusion & FP8 Quantized GEMM for Transformers | 高性能 Triton 算子库:RMSNorm+RoPE 融合、Gated MLP 融合、FP8 量化 GEMM,专为 Transformer 优化
A specialized compiler that optimizes deep learning models for AI accelerators with operator fusion, memory optimization, and hardware-specific passes.
TensorMorph is an AI-assisted MLIR compiler for TOSA graph optimization and operator fusion.
Add a description, image, and links to the operator-fusion topic page so that developers can more easily learn about it.
To associate your repository with the operator-fusion topic, visit your repo's landing page and select "manage topics."