Lists (6)
Sort Name ascending (A-Z)
Stars
KV cache compression for high-throughput LLM inference
Linux running inside a PDF file via a RISC-V emulator
(WIP) A small but powerful, homemade PyTorch from scratch.
A throughput-oriented high-performance serving framework for LLMs
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
The repository for the 2024 IEEE Cloud Submission of OS4C
Efficient Triton Kernels for LLM Training
The book "Performance Analysis and Tuning on Modern CPU"
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
An open-source Chinese font derived from Fontworks' Klee One. 一款开源中文字体,基于 FONTWORKS 出品字体 Klee One 衍生。
Ring attention implementation with flash attention
REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.
A place to record memories, knowledge and ideas | 记录回忆,知识和畅想的地方
一个简单柔和的 Typora 主题,基于 Lapis 主题开发。 A simple and soft Typora theme based on the Lapis theme.
LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step
TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.
A highly optimized LLM inference acceleration engine for Llama and its variants.
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Building a basic async runtime from scratch in embedded Rust
Ebpf based memory leak detection by tracing memory allocation and deallocation requests and collecting the call stacks for each allocation
提供多款 Shadowrocket 规则,拥有强劲的广告过滤功能。每日 8 时重新构建规则。
Triton Documentation in Chinese Simplified / Triton 中文文档