Stars
A collection of libraries to optimise AI model performances
🦄 🦄 🦄 Core smart contracts of Uniswap v3
High performance distributed framework for training deep learning recommendation models based on PyTorch.
An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.
Making large AI models cheaper, faster and more accessible
ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.
A cloud-native vector database, storage for next generation AI applications
Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.
l-nic / chipyard
Forked from ucb-bar/chipyardAn Agile Chisel-Based SoC Design Framework
Slicing a PyTorch Tensor Into Parallel Shards
A benchmark for testing PCIe and host/device memory bandwith and communication contention on multi-GPU and multi-CPU systems.
The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions
A 128 bit unsigned integer class for CUDA
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Automatically generate a C++ header file including Cuda device-specific parameters
A GPU-powered real-time analytics storage and query engine.
Virtual Kubelet is an open source Kubernetes kubelet implementation.
brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…