yzhaiustc

Yujia Zhai yzhaiustc

152 followers · 15 following

@NVIDIA
Santa Clara, California
17:52 (UTC -08:00)
https://yzhaiustc.github.io/

Achievements

x2 x2

Achievements

x2 x2

Stars

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 17,759 1,474 Updated Feb 8, 2025

deepseek-ai / DeepSeek-R1

68,696 8,838 Updated Feb 8, 2025

SiriusNEO / Triton-Puzzles-Lite

Puzzles for learning Triton, play it with minimal environment configuration!

Python 218 15 Updated Dec 3, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 14,319 1,778 Updated Feb 9, 2025

ChenLiu-1996 / CitationMap

A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.

Python 498 40 Updated Feb 7, 2025

mit-han-lab / qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 495 29 Updated Jan 25, 2025

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 289 25 Updated Oct 30, 2024

tinygrad / tinygrad

You like pytorch? You like micrograd? You love tinygrad! ❤️

Python 27,877 3,129 Updated Feb 9, 2025

xai-org / grok-1

Grok open release

Python 49,892 8,336 Updated Aug 30, 2024

volcengine / veScale

A PyTorch Native LLM Training Framework

Python 705 38 Updated Dec 27, 2024

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 697 56 Updated Sep 4, 2024

google / heir

A compiler for homomorphic encryption

C++ 387 61 Updated Feb 9, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,349 1,091 Updated Feb 8, 2025

iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,971 653 Updated Feb 8, 2025

AlibabaResearch / flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 196 16 Updated Sep 24, 2023

tlc-pack / libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 99 13 Updated Sep 10, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 15,359 1,446 Updated Feb 8, 2025

intel / xetla

C++ 60 20 Updated Dec 18, 2024

raulbehl / 100DaysOfRTL

100 Days of RTL

SystemVerilog 349 102 Updated Aug 15, 2024

vosen / ZLUDA

CUDA on non-NVIDIA GPUs

Rust 10,588 683 Updated Feb 7, 2025

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 39,062 4,366 Updated Feb 6, 2025

eth-cscs / spla

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

C++ 27 7 Updated Jun 26, 2024

icl-utk-edu / slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Ene…

C++ 106 23 Updated Jan 11, 2025

syclsparklers / XeHE

PostScript 3 Updated Apr 5, 2023

syclsparklers / directory

1 2 Updated Apr 3, 2023

twitter / the-algorithm

Source code for Twitter's Recommendation Algorithm

Scala 62,875 12,174 Updated Jul 10, 2024

NVIDIA / cuda-quantum

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

C++ 600 208 Updated Feb 9, 2025

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15,457 2,655 Updated Dec 18, 2024

tatsu-lab / stanford_alpaca

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,790 4,058 Updated Jul 17, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 73,543 10,605 Updated Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yujia Zhai yzhaiustc

Achievements

Achievements

Block or report yzhaiustc

Stars

huggingface / open-r1

deepseek-ai / DeepSeek-R1

SiriusNEO / Triton-Puzzles-Lite

triton-lang / triton

ChenLiu-1996 / CitationMap

mit-han-lab / qserve

bytedance / flux

tinygrad / tinygrad

xai-org / grok-1

volcengine / veScale

IST-DASLab / marlin

google / heir

NVIDIA / TensorRT-LLM

iree-org / iree

AlibabaResearch / flash-llm

tlc-pack / libflash_attn

Dao-AILab / flash-attention

intel / xetla

raulbehl / 100DaysOfRTL

vosen / ZLUDA

hpcaitech / ColossalAI

eth-cscs / spla

icl-utk-edu / slate

syclsparklers / XeHE

syclsparklers / directory

twitter / the-algorithm

NVIDIA / cuda-quantum

openai / evals

tatsu-lab / stanford_alpaca

ggerganov / llama.cpp