Skip to content
View jiujiuwei's full-sized avatar
  • Northwestern Polytechnical University
  • 01:39 (UTC -12:00)

Block or report jiujiuwei

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 1,140 167 Updated Jul 29, 2023

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 616 49 Updated Aug 14, 2025

每个人都能看懂的大模型知识分享,LLMs春/秋招大模型面试前必看,让你和面试官侃侃而谈

Jupyter Notebook 4,192 416 Updated Jun 7, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,007 1,076 Updated Sep 5, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 6,769 697 Updated Sep 3, 2025

My learning notes/codes for ML SYS.

Python 3,534 217 Updated Sep 3, 2025

This is a Chinese translation of the CUDA programming guide

1,669 247 Updated Nov 13, 2024

Optimize softmax in triton in many cases

Python 21 Updated Sep 6, 2024

[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Python 490 33 Updated Feb 10, 2025

TinyChatEngine: On-Device LLM Inference Library

C++ 892 91 Updated Jul 4, 2024

[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

C++ 747 51 Updated Mar 6, 2025

Video+code lecture on building nanoGPT from scratch

Python 4,339 675 Updated Aug 13, 2024

🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…

3,249 340 Updated Jul 25, 2025

LLM inference in C/C++

C++ 86,156 12,950 Updated Sep 7, 2025

TinySTL is a subset of STL(cut some containers and algorithms) and also a superset of STL(add some other containers and algorithms)

C++ 2,470 657 Updated Oct 27, 2018

「C/C++学习+面试指南」一份涵盖大部分 C++ 程序员所需要掌握的知识。入门、进阶、深入、校招、社招,准备 C++ 学习& 面试,首选 CppGuide!

2,153 248 Updated Jul 29, 2023

A curated list for Efficient Large Language Models

Python 1,857 146 Updated Jun 17, 2025

A brief of TorchScript by MNIST

C++ 112 21 Updated Jun 30, 2022

CUDA/Metal accelerated language model inference

C 611 29 Updated May 29, 2025

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (p…

2,207 226 Updated Mar 4, 2025