Skip to content
View zianglih's full-sized avatar

Highlights

  • Pro

Block or report zianglih

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Backward compatible ML compute opset inspired by HLO/MHLO

MLIR 436 122 Updated Jan 27, 2025

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

Python 31,078 2,873 Updated Jan 27, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,687 226 Updated Jan 10, 2025

ZMK Config repository for the Huiben Lab ALU40

CMake 5 66 Updated Dec 21, 2024

A sequence of Jupyter notebooks featuring the "12 Steps to Navier-Stokes" http://lorenabarba.com/

Jupyter Notebook 3,529 1,199 Updated Mar 19, 2024

LLM serving cluster simulator

Jupyter Notebook 90 8 Updated Apr 25, 2024

Running BERT without Padding

C++ 468 54 Updated Mar 18, 2022

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

HTML 12,739 863 Updated Jan 22, 2025

Inference code for Llama models

Python 57,352 9,673 Updated Jan 26, 2025

Root Mean Square Layer Normalization

Python 223 12 Updated Mar 28, 2023

A CUDA tutorial to make people learn CUDA program from 0

Cuda 202 55 Updated Jul 9, 2024

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,208 525 Updated Aug 21, 2024

📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).

Cuda 2,149 228 Updated Jan 27, 2025

API capture-replay tool for Vulkan, OpenCL, Intel oneAPI Level Zero and OpenGL

C++ 41 8 Updated Jan 23, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 34,992 5,334 Updated Jan 27, 2025

The OpenCL Conformance Tests

C++ 197 205 Updated Jan 22, 2025

PyTorch Implementation of OpenAI's Image GPT

Python 255 34 Updated Oct 3, 2023

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 27,107 3,413 Updated Jul 23, 2024
Python 2,048 386 Updated Apr 29, 2022

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

C++ 11,102 2,152 Updated Dec 13, 2024

Some CUDA design patterns and a bit of template magic for CUDA

C++ 148 6 Updated Jun 3, 2023

CUDA on non-NVIDIA GPUs

Rust 10,455 677 Updated Jan 27, 2025

Simple OpenCL examples for exploiting GPU computing

Objective-C++ 201 72 Updated Aug 1, 2024

Frame profiler

C++ 10,646 722 Updated Jan 27, 2025

Android GPU Inspector

Go 974 142 Updated Jan 20, 2025

Vulkan Profiles Tools

C++ 121 45 Updated Jan 24, 2025

Graphics API Capture and Replay Tools for Reconstructing Graphics Application Behavior

C++ 426 126 Updated Jan 27, 2025

A conformant OpenGL ES implementation for Windows, Mac, Linux, iOS and Android.

C++ 3,554 625 Updated Jan 27, 2025

Tutorials for writing high-performance GPU operators in AI frameworks.

Cuda 127 16 Updated Aug 12, 2023
Next