Skip to content
View bbshocking's full-sized avatar

Block or report bbshocking

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A collection of libraries to optimise AI model performances

Python 8,374 639 Updated Jul 22, 2024

NVIDIA container runtime library

C 841 205 Updated Nov 9, 2024

🦄 🦄 🦄 Core smart contracts of Uniswap v3

TypeScript 4,412 2,719 Updated Nov 3, 2024

High performance distributed framework for training deep learning recommendation models based on PyTorch.

Rust 396 51 Updated Nov 6, 2024

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries

Python 6,938 1,010 Updated Nov 12, 2024

F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.

C 3,862 898 Updated Oct 18, 2024

Making large AI models cheaper, faster and more accessible

Python 38,786 4,344 Updated Nov 13, 2024

ComScribe is a tool to identify communication among all GPU-GPU and CPU-GPU pairs in a single-node multi-GPU system.

C++ 24 4 Updated Jul 6, 2023

A cloud-native vector database, storage for next generation AI applications

Go 30,408 2,917 Updated Nov 13, 2024

Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 560 78 Updated Sep 11, 2024

An Agile Chisel-Based SoC Design Framework

Scala 26 2 Updated Dec 29, 2021

热咖啡

JavaScript 188 9 Updated Feb 2, 2023

Slicing a PyTorch Tensor Into Parallel Shards

Python 296 15 Updated Jul 27, 2021

A benchmark for testing PCIe and host/device memory bandwith and communication contention on multi-GPU and multi-CPU systems.

C++ 9 1 Updated Jun 9, 2016

The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions

Python 1,412 148 Updated Nov 5, 2024

BLAS-like Library Instantiation Software Framework

C 2,304 367 Updated Nov 4, 2024

A 128 bit unsigned integer class for CUDA

C++ 43 15 Updated Nov 10, 2021

The Ceph Benchmarking Tool

Python 269 142 Updated Oct 9, 2024

ONNX-TensorRT: TensorRT backend for ONNX

C++ 2,950 544 Updated Nov 5, 2024

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,206 500 Updated Nov 13, 2024

Tensorflow Backend for ONNX

Python 1,285 296 Updated Mar 28, 2024

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,421 1,820 Updated Jul 26, 2024

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

Python 427 74 Updated Nov 13, 2024

Automatically generate a C++ header file including Cuda device-specific parameters

C++ 3 Updated Jul 1, 2020

A GPU-powered real-time analytics storage and query engine.

Go 3,031 233 Updated Jul 13, 2024

Rodinia benchmark

C 169 87 Updated Apr 14, 2023

Running BERT without Padding

C++ 460 52 Updated Mar 18, 2022

Virtual Kubelet is an open source Kubernetes kubelet implementation.

Go 4,210 625 Updated Nov 11, 2024

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…

C++ 16,537 3,970 Updated Nov 9, 2024
Next