Highlights
- Pro
Lists (7)
Sort Name ascending (A-Z)
- All languages
- Assembly
- Batchfile
- Bikeshed
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Cuda
- D
- Dockerfile
- Emacs Lisp
- Go
- HLSL
- HTML
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Kotlin
- LLVM
- Lua
- MATLAB
- MLIR
- Makefile
- Markdown
- PHP
- Perl
- PowerShell
- Python
- Ruby
- Rust
- SCSS
- Shell
- Stylus
- SystemVerilog
- TeX
- TypeScript
- Vim Script
- Vue
Starred repositories
A Python-embedded modeling language for convex optimization problems.
The most Obsidian-native PDF annotation, viewing & editing tool ever. Comes with optional Vim keybindings.
DLRover: An Automatic Distributed Deep Learning System
Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.
A Minimal, Header only Modern c++ library for terminal goodies 💄✨
FlashInfer: Kernel Library for LLM Serving
Examples of CUDA implementations by Cutlass CuTe
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Standalone Flash Attention v2 kernel without libtorch dependency
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
FlagGems is an operator library for large language models implemented in Triton Language.
A fast communication-overlapping library for tensor parallelism on GPUs.
Open deep learning compiler stack for Kendryte AI accelerators ✨
Fast inference from large lauguage models via speculative decoding