Skip to content
View yzhaiustc's full-sized avatar

Block or report yzhaiustc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fully open reproduction of DeepSeek-R1

Python 17,759 1,474 Updated Feb 8, 2025

Puzzles for learning Triton, play it with minimal environment configuration!

Python 218 15 Updated Dec 3, 2024

Development repository for the Triton language and compiler

C++ 14,319 1,778 Updated Feb 9, 2025

A simple pip-installable Python tool to generate your own HTML citation world map from your Google Scholar ID.

Python 498 40 Updated Feb 7, 2025

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 495 29 Updated Jan 25, 2025

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 289 25 Updated Oct 30, 2024

You like pytorch? You like micrograd? You love tinygrad! ❤️

Python 27,877 3,129 Updated Feb 9, 2025

Grok open release

Python 49,892 8,336 Updated Aug 30, 2024

A PyTorch Native LLM Training Framework

Python 705 38 Updated Dec 27, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 697 56 Updated Sep 4, 2024

A compiler for homomorphic encryption

C++ 387 61 Updated Feb 9, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 9,349 1,091 Updated Feb 8, 2025

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 2,971 653 Updated Feb 8, 2025

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 196 16 Updated Sep 24, 2023

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 99 13 Updated Sep 10, 2024

Fast and memory-efficient exact attention

Python 15,359 1,446 Updated Feb 8, 2025
C++ 60 20 Updated Dec 18, 2024

100 Days of RTL

SystemVerilog 349 102 Updated Aug 15, 2024

CUDA on non-NVIDIA GPUs

Rust 10,588 683 Updated Feb 7, 2025

Making large AI models cheaper, faster and more accessible

Python 39,062 4,366 Updated Feb 6, 2025

Specialized Parallel Linear Algebra, providing distributed GEMM functionality for specific matrix distributions with optional GPU acceleration.

C++ 27 7 Updated Jun 26, 2024

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Ene…

C++ 106 23 Updated Jan 11, 2025
PostScript 3 Updated Apr 5, 2023

Source code for Twitter's Recommendation Algorithm

Scala 62,875 12,174 Updated Jul 10, 2024

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

C++ 600 208 Updated Feb 9, 2025

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 15,457 2,655 Updated Dec 18, 2024

Code and documentation to train Stanford's Alpaca models, and generate the data.

Python 29,790 4,058 Updated Jul 17, 2024

LLM inference in C/C++

C++ 73,543 10,605 Updated Feb 8, 2025
Next