Stars
A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
[ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
A high-throughput and memory-efficient inference and serving engine for LLMs
OpenVINO™ is an open source toolkit for optimizing and deploying AI inference
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
The repository maintains the source code for the article titled "Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs."
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Factorization Machines for Recommendation and Ranking Problems with Implicit Feedback Data
A direct convolution library targeting ARM multi-core CPUs.
A list of awesome compiler projects and papers for tensor computation and deep learning.
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
Python package built to ease deep learning on graph, on top of existing DL frameworks.