jiujiuwei

Jiang Tao jiujiuwei

a student in Northwestern Polytechnical University

Northwestern Polytechnical University
01:39 (UTC -12:00)

Stars

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,140 167 Updated Jul 29, 2023

nunchaku-tech / deepcompressor

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 616 49 Updated Aug 14, 2025

luhengshiwo / LLMForEverybody

每个人都能看懂的大模型知识分享，LLMs春/秋招大模型面试前必看，让你和面试官侃侃而谈

Jupyter Notebook 4,192 416 Updated Jun 7, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,007 1,076 Updated Sep 5, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 6,769 697 Updated Sep 3, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 3,534 217 Updated Sep 3, 2025

HeKun-NVIDIA / CUDA-Programming-Guide-in-Chinese

This is a Chinese translation of the CUDA programming guide

1,669 247 Updated Nov 13, 2024

iclementine / optimize_softmax

Optimize softmax in triton in many cases

Python 21 Updated Sep 6, 2024

mit-han-lab / duo-attention

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 490 33 Updated Feb 10, 2025

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

C++ 892 91 Updated Jul 4, 2024

mit-han-lab / omniserve

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 747 51 Updated Mar 6, 2025

karpathy / build-nanogpt

Video+code lecture on building nanoGPT from scratch

Python 4,339 675 Updated Aug 13, 2024

HuaizhengZhang / AI-Infra-from-Zero-to-Hero

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,249 340 Updated Jul 25, 2025

ggml-org / llama.cpp

LLM inference in C/C++

C++ 86,156 12,950 Updated Sep 7, 2025

zouxiaohang / TinySTL

TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)

C++ 2,470 657 Updated Oct 27, 2018

GrindGold / CppGuide

「C/C++学习+面试指南」一份涵盖大部分 C++ 程序员所需要掌握的知识。入门、进阶、深入、校招、社招，准备 C++ 学习& 面试，首选 CppGuide！

2,153 248 Updated Jul 29, 2023

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,857 146 Updated Jun 17, 2025

louis-she / torchscript-demos

A brief of TorchScript by MNIST

C++ 112 21 Updated Jun 30, 2022

zeux / calm

CUDA/Metal accelerated language model inference

C 611 29 Updated May 29, 2025

Efficient-ML / Awesome-Model-Quantization

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (p…

2,207 226 Updated Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly