Skip to content
View JinuJeong's full-sized avatar

Block or report JinuJeong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable

Python 133 9 Updated Sep 21, 2024

Vim plugin for syntax-aware code formatting

Vim Script 1,112 111 Updated Aug 31, 2024

Processing-In-Memory (PIM) Simulator

C++ 145 48 Updated Dec 12, 2024

Official inference framework for 1-bit LLMs

C++ 12,629 881 Updated Dec 20, 2024
Python 4 Updated Nov 19, 2024

A highly-flexible GPU simulator for AMD GPUs.

Go 114 26 Updated Jan 17, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 7,394 713 Updated Jan 19, 2025

minimize caching effects

C 559 54 Updated Jun 10, 2024

Let ChatGPT teach your own chatbot in hours with a single GPU!

Python 3,168 287 Updated Mar 17, 2024

Automatically Discovering Fast Parallelization Strategies for Distributed Deep Neural Network Training

C++ 1,748 234 Updated Jan 18, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,166 1,070 Updated Jan 16, 2025

Development repository for the Triton language and compiler

C++ 14,062 1,715 Updated Jan 19, 2025

Fast and memory-efficient exact attention

Python 15,106 1,428 Updated Jan 18, 2025

How and why you want to make your pytorch CUDA/CPP extension with a Makefile

Makefile 172 16 Updated Jul 3, 2019

Hackable and optimized Transformers building blocks, supporting a composable construction.

Python 8,920 636 Updated Jan 16, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 33,953 5,214 Updated Jan 19, 2025

Run a parallel command inside a split tmux window

Shell 141 38 Updated Feb 22, 2022

Transformer related optimization, including BERT, GPT

C++ 5,983 894 Updated Mar 27, 2024
Python 37 8 Updated Sep 22, 2021

Exploring the Design Space of Page Management for Multi-Tiered Memory Systems (USENIX ATC '21)

C 43 6 Updated Mar 31, 2022

Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access (ACM EuroSys '23)

C++ 54 8 Updated Mar 29, 2024
C 23 6 Updated Aug 19, 2022

Official code repository for "CoVA: Exploiting Compressed-Domain Analysis to Accelerate Video Analytics [USENIX ATC 22]"

Rust 16 2 Updated Sep 19, 2024

Neomorphism(neumorphism) Design Framework Open Source

CSS 45 5 Updated Aug 21, 2022

Nodejs extension host for vim & neovim, load extensions like VSCode and host language servers.

TypeScript 24,620 959 Updated Jan 13, 2025

Collect naver entertain news comments

Python 2 2 Updated Dec 8, 2022