-
Korea University
Stars
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
Vim plugin for syntax-aware code formatting
SGLang is a fast serving framework for large language models and vision language models.
Let ChatGPT teach your own chatbot in hours with a single GPU!
Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
Development repository for the Triton language and compiler
Fast and memory-efficient exact attention
How and why you want to make your pytorch CUDA/CPP extension with a Makefile
Hackable and optimized Transformers building blocks, supporting a composable construction.
A high-throughput and memory-efficient inference and serving engine for LLMs
Run a parallel command inside a split tmux window
Transformer related optimization, including BERT, GPT
Exploring the Design Space of Page Management for Multi-Tiered Memory Systems (USENIX ATC '21)
Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access (ACM EuroSys '23)
Official code repository for "CoVA: Exploiting Compressed-Domain Analysis to Accelerate Video Analytics [USENIX ATC 22]"
Neomorphism(neumorphism) Design Framework Open Source
Nodejs extension host for vim & neovim, load extensions like VSCode and host language servers.