Lists (5)
Sort Name ascending (A-Z)
Stars
Reverse Engineering: Decompiling Binary Code with Large Language Models
CPU inference for the DeepSeek family of large language models in pure C++
FlashInfer: Kernel Library for LLM Serving
Curated collection of papers in MoE model inference
JAX bindings for the flash-attention3 kernels
Fast and memory-efficient exact attention
Custom Linux scheduler for concurrency fuzzing written in Java with hello-ebpf
FlagGems is an operator library for large language models implemented in Triton Language.
My learning notes/codes for ML SYS.
Perceptual video quality assessment based on multi-method fusion.
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
zzhbrr / mlsys-arxiv-daily
Forked from Vincentqyw/cv-arxiv-daily🎓Automatically Update MLSys Papers Daily using Github Actions (Update Every 12th hours)
Implementation of Alphafold 3 from Google Deepmind in Pytorch
Course materials for MIT6.5940: TinyML and Efficient Deep Learning Computing
GitHub page for "Large Language Model-Brained GUI Agents: A Survey"
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
Port of OpenAI's Whisper model in C/C++
Building blocks for foundation models.
A visualized debugging framework to aid in understanding the Linux kernel.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
上海交通大学 Beamer 模版 | Beamer template for Shanghai Jiao Tong University