Miroier

Follow

💻

Learning

Miroier

💻

Learning

Follow

10 followers · 107 following

Achievements

Achievements

Highlights

Pro

Lists (7)

Sort

algo

cpp

15 repositories

linux

📝paper

Software

45 repositories

wait to list

website

12 repositories

Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

feifeibear / ChituAttention

Quantized Attention on GPU

Python 16 Updated Nov 1, 2024

t0saki / IT5005-Artificial-Intelligence_Assignment-1

Jupyter Notebook 1 Updated Oct 4, 2024

cvxpy / cvxpy

A Python-embedded modeling language for convex optimization problems.

C++ 5,437 1,066 Updated Nov 1, 2024

j2kun / mlir-tutorial

MLIR For Beginners tutorial

C++ 804 66 Updated Sep 30, 2024

RyotaUshio / obsidian-pdf-plus

The most Obsidian-native PDF annotation, viewing & editing tool ever. Comes with optional Vim keybindings.

TypeScript 788 15 Updated Oct 22, 2024

intelligent-machine-learning / dlrover

DLRover: An Automatic Distributed Deep Learning System

Python 1,262 163 Updated Nov 1, 2024

baaivision / Emu3

Next-Token Prediction is All You Need

Python 1,741 64 Updated Oct 24, 2024

astralapp / astral

Organize Your GitHub Stars With Ease

PHP 3,213 143 Updated Aug 11, 2024

rox906 / tcFFT

Cuda 32 13 Updated May 21, 2021

DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library

C++ 1,540 92 Updated Sep 29, 2024

anlongfei / compilerbook

compilerbook

48 28 Updated Apr 25, 2021

Bruce-Lee-LY / decoding_attention

Decoding Attention is specially optimized for multi head attention (MHA) using CUDA core for the decoding stage of LLM inference.

C++ 22 1 Updated Oct 12, 2024

agauniyal / rang

A Minimal, Header only Modern c++ library for terminal goodies 💄✨

C++ 1,498 143 Updated Jul 23, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 1,375 126 Updated Nov 2, 2024

DD-DuDa / Cute-Learning

Examples of CUDA implementations by Cutlass CuTe

Makefile 80 12 Updated Oct 31, 2024

pcg-mlp / KsanaLLM

C++ 282 29 Updated Oct 29, 2024

pkuzengqi / how-to-begin-phd-CN

75 1 Updated Jul 20, 2022

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 551 110 Updated Oct 30, 2024

thu-ml / SageAttention

Quantized Attention that achieves speedups of 2.1x and 2.7x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

Python 351 14 Updated Oct 24, 2024

tlc-pack / libflash_attn

Standalone Flash Attention v2 kernel without libtorch dependency

C++ 97 13 Updated Sep 10, 2024

google / bloaty

Bloaty: a size profiler for binaries

C++ 4,769 345 Updated Oct 1, 2024

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 1,055 72 Updated Sep 25, 2024

ZonePG / cs-notes

my cs notes

Jupyter Notebook 26 2 Updated Oct 14, 2024

tgale96 / grouped_gemm

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 49 38 Updated Oct 31, 2024

bytedance / ABQ-LLM

An acceleration library that supports arbitrary bit-width combinatorial quantization operations

C++ 221 23 Updated Sep 30, 2024

FlagOpen / FlagGems

FlagGems is an operator library for large language models implemented in Triton Language.

Python 320 33 Updated Nov 2, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 211 16 Updated Oct 30, 2024

kendryte / nncase

Open deep learning compiler stack for Kendryte AI accelerators ✨

C# 748 181 Updated Nov 1, 2024

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 553 57 Updated Aug 22, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,610 62 Updated Nov 1, 2024

Starred topics

C++