Skip to content
View AnonymousYWL's full-sized avatar

Block or report AnonymousYWL

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A modern formatting library

C++ 22,324 2,704 Updated Sep 13, 2025

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Python 1,958 291 Updated Sep 15, 2025

[ISCA'25] LIA: A Single-GPU LLM Inference Acceleration with Cooperative AMX-Enabled CPU-GPU Computation and CXL Offloading

Python 9 1 Updated Jun 28, 2025

[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"

Python 70 3 Updated Jun 11, 2025
Python 199 27 Updated May 5, 2025

Low-bit LLM inference on CPU/NPU with lookup table

C++ 854 70 Updated Jun 5, 2025

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 2,273 269 Updated Sep 16, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 58,155 10,147 Updated Sep 16, 2025

OpenVINO™ is an open source toolkit for optimizing and deploying AI inference

C++ 8,824 2,720 Updated Sep 16, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,053 1,082 Updated Sep 12, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,712 694 Updated Sep 12, 2025

FlashMLA: Efficient MLA kernels

C++ 11,720 899 Updated Aug 27, 2025

LLM inference in C/C++

C++ 86,534 13,049 Updated Sep 16, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 673 52 Updated Aug 6, 2025
Python 150 12 Updated Jul 22, 2024

The repository maintains the source code for the article titled "Optimizing Attention by Exploiting Data Reuse on ARM Multi-core CPUs."

Makefile 16 3 Updated Dec 1, 2024

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,170 215 Updated Oct 8, 2024

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 552 66 Updated Sep 11, 2024
Cuda 32 12 Updated Aug 24, 2022

Factorization Machines for Recommendation and Ranking Problems with Implicit Feedback Data

Python 175 38 Updated Aug 14, 2024
3 Updated May 9, 2023

A direct convolution library targeting ARM multi-core CPUs.

C 12 3 Updated Nov 27, 2024

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,647 318 Updated Oct 19, 2024

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,675 381 Updated Aug 22, 2025
Cuda 9 2 Updated Apr 21, 2022

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 93,292 10,502 Updated Sep 14, 2025

Python package built to ease deep learning on graph, on top of existing DL frameworks.

Python 14,065 3,050 Updated Jul 31, 2025
Next