cherhh

cher cherhh

rubbish

15 followers · 392 following

Lists (6)

Sort

Stars

IsaacRe / vllm-kvcompress

KV cache compression for high-throughput LLM inference

Python 114 5 Updated Feb 5, 2025

ColfaxResearch / cfx-article-src

C++ 69 17 Updated Dec 20, 2024

eunomia-bpf / llvmbpf-nvptx

LLVM 6 Updated Feb 4, 2025

ading2210 / linuxpdf

Linux running inside a PDF file via a RISC-V emulator

C 2,794 93 Updated Feb 2, 2025

MarioSieg / magnetron

(WIP) A small but powerful, homemade PyTorch from scratch.

C++ 520 24 Updated Feb 14, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Cuda 732 29 Updated Sep 21, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 2,527 150 Updated Feb 13, 2025

tchajed / minimal-elf

Creating a minimal ELF file

Rust 117 4 Updated Nov 14, 2024

UIUC-ChenLab / OS4C

The repository for the 2024 IEEE Cloud Submission of OS4C

Verilog 4 1 Updated Oct 22, 2024

opencompl / lean-mlir

A minimal development of SSA theory

Lean 108 12 Updated Feb 12, 2025

michaelfeil / candle-flash-attn-v3

C++ 10 1 Updated Jan 28, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 4,407 266 Updated Feb 12, 2025

dendibakh / perf-book

The book "Performance Analysis and Tuning on Modern CPU"

TeX 2,779 192 Updated Dec 24, 2024

SNU-ARC / any-precision-llm

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 95 3 Updated Dec 23, 2024

lxgw / LxgwWenKai

An open-source Chinese font derived from Fontworks' Klee One. 一款开源中文字体，基于 FONTWORKS 出品字体 Klee One 衍生。

Batchfile 19,038 532 Updated Jan 18, 2025

zhuzilin / ring-flash-attention

Ring attention implementation with flash attention

Python 673 59 Updated Dec 19, 2024

SJTU-IPADS / reef

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.

Cuda 90 9 Updated Dec 24, 2022

HaifengSun-Kira / RDMA-Tutorial

C 32 4 Updated Dec 31, 2021

nolebase / nolebase

A place to record memories, knowledge and ideas | 记录回忆，知识和畅想的地方

Markdown 481 74 Updated Feb 13, 2025

chlorine3545 / typora-theme-marble

一个简单柔和的 Typora 主题，基于 Lapis 主题开发。 A simple and soft Typora theme based on the Lapis theme.

CSS 9 Updated Jan 10, 2025

howardlau1999 / rdmapp

C++ interfaces for RDMA access

C++ 66 4 Updated Jan 20, 2025

FloridSleeves / LLMDebugger

LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step

Python 487 50 Updated Sep 10, 2024

microsoft / TileFusion

TileFusion is a highly efficient kernel template library designed to elevate the level of abstraction in CUDA C for processing tiles.

Cuda 53 5 Updated Feb 13, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 851 102 Updated Feb 11, 2025

freelancer-leon / notes

334 136 Updated Jan 23, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 603 119 Updated Oct 30, 2024

therustybits / zero-to-async

Building a basic async runtime from scratch in embedded Rust

Rust 78 3 Updated Sep 21, 2024

PrathumP / Ebpf_based_Memleak_detection

Ebpf based memory leak detection by tracing memory allocation and deallocation requests and collecting the call stacks for each allocation

C 6 2 Updated Jul 10, 2024

Johnshall / Shadowrocket-ADBlock-Rules-Forever

提供多款 Shadowrocket 规则，拥有强劲的广告过滤功能。每日 8 时重新构建规则。

14,036 882 Updated Feb 13, 2025

hyperai / triton-cn

Triton Documentation in Chinese Simplified / Triton 中文文档

TypeScript 53 6 Updated Jan 10, 2025

cher cherhh

Lists (6)

course

llm serve

mlsys

os

system

tools

Stars