-
University of Michigan
- Ann Arbor, MI
-
13:43
(UTC -05:00) - in/zianglih
- https://typst.app/project/rmQLaVybzmnoQ_Q4dQxDqc
Highlights
- Pro
Starred repositories
Backward compatible ML compute opset inspired by HLO/MHLO
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
ZMK Config repository for the Huiben Lab ALU40
A sequence of Jupyter notebooks featuring the "12 Steps to Navier-Stokes" http://lorenabarba.com/
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
A CUDA tutorial to make people learn CUDA program from 0
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
📚200+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
API capture-replay tool for Vulkan, OpenCL, Intel oneAPI Level Zero and OpenGL
A high-throughput and memory-efficient inference and serving engine for LLMs
PyTorch Implementation of OpenAI's Image GPT
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Some CUDA design patterns and a bit of template magic for CUDA
Simple OpenCL examples for exploiting GPU computing
Graphics API Capture and Replay Tools for Reconstructing Graphics Application Behavior
A conformant OpenGL ES implementation for Windows, Mac, Linux, iOS and Android.
Tutorials for writing high-performance GPU operators in AI frameworks.