Skip to content
View cherhh's full-sized avatar

Block or report cherhh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

KV cache compression for high-throughput LLM inference

Python 114 5 Updated Feb 5, 2025
LLVM 6 Updated Feb 4, 2025

Linux running inside a PDF file via a RISC-V emulator

C 2,794 93 Updated Feb 2, 2025

(WIP) A small but powerful, homemade PyTorch from scratch.

C++ 520 24 Updated Feb 14, 2025

A throughput-oriented high-performance serving framework for LLMs

Cuda 732 29 Updated Sep 21, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,527 150 Updated Feb 13, 2025

Creating a minimal ELF file

Rust 117 4 Updated Nov 14, 2024

The repository for the 2024 IEEE Cloud Submission of OS4C

Verilog 4 1 Updated Oct 22, 2024

A minimal development of SSA theory

Lean 108 12 Updated Feb 12, 2025

Efficient Triton Kernels for LLM Training

Python 4,407 266 Updated Feb 12, 2025

The book "Performance Analysis and Tuning on Modern CPU"

TeX 2,779 192 Updated Dec 24, 2024

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 95 3 Updated Dec 23, 2024

An open-source Chinese font derived from Fontworks' Klee One. 一款开源中文字体,基于 FONTWORKS 出品字体 Klee One 衍生。

Batchfile 19,038 532 Updated Jan 18, 2025

Ring attention implementation with flash attention

Python 673 59 Updated Dec 19, 2024

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.

Cuda 90 9 Updated Dec 24, 2022

A place to record memories, knowledge and ideas | 记录回忆,知识和畅想的地方

Markdown 481 74 Updated Feb 13, 2025

一个简单柔和的 Typora 主题,基于 Lapis 主题开发。 A simple and soft Typora theme based on the Lapis theme.

CSS 9 Updated Jan 10, 2025

C++ interfaces for RDMA access

C++ 66 4 Updated Jan 20, 2025

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step

Python 487 50 Updated Sep 10, 2024

TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.

Cuda 53 5 Updated Feb 13, 2025

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 851 102 Updated Feb 11, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 603 119 Updated Oct 30, 2024

Building a basic async runtime from scratch in embedded Rust

Rust 78 3 Updated Sep 21, 2024

Ebpf based memory leak detection by tracing memory allocation and deallocation requests and collecting the call stacks for each allocation

C 6 2 Updated Jul 10, 2024

提供多款 Shadowrocket 规则,拥有强劲的广告过滤功能。每日 8 时重新构建规则。

14,036 882 Updated Feb 13, 2025

Triton Documentation in Chinese Simplified / Triton 中文文档

TypeScript 53 6 Updated Jan 10, 2025
Next