Repository for nvCOMP docs and examples. nvCOMP is a library for fast lossless compression/decompression on the GPU that can be downloaded from https://developer.nvidia.com/nvcomp.

C++ 560 78 Updated Sep 11, 2024

l-nic / chipyard

Forked from ucb-bar/chipyard

An Agile Chisel-Based SoC Design Framework

Scala 26 2 Updated Dec 29, 2021

sohutv / hotcaffeine

热咖啡

JavaScript 188 9 Updated Feb 2, 2023

kaiyuyue / torchshard

Slicing a PyTorch Tensor Into Parallel Shards

Python 296 15 Updated Jul 27, 2021

gabaker / TARUC_Bench

A benchmark for testing PCIe and host/device memory bandwith and communication contention on multi-GPU and multi-CPU systems.

C++ 9 1 Updated Jun 9, 2016

intelxed / xed

The X86 Encoder Decoder (XED), is a software library for encoding and decoding X86 (IA32 and Intel64) instructions

Python 1,412 148 Updated Nov 5, 2024

flame / blis

BLAS-like Library Instantiation Software Framework

C 2,304 367 Updated Nov 4, 2024

curtisseizert / CUDA-uint128

A 128 bit unsigned integer class for CUDA

C++ 43 15 Updated Nov 10, 2021

ceph / cbt

The Ceph Benchmarking Tool

Python 269 142 Updated Oct 9, 2024

ververica / flink-sql-benchmark

Java 103 51 Updated Jul 20, 2023

onnx / onnx-tensorrt

ONNX-TensorRT: TensorRT backend for ONNX

C++ 2,950 544 Updated Nov 5, 2024

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,206 500 Updated Nov 13, 2024

onnx / onnx-tensorflow

Tensorflow Backend for ONNX

Python 1,285 296 Updated Mar 28, 2024

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 6,421 1,820 Updated Jul 26, 2024

triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.

Python 427 74 Updated Nov 13, 2024

BDHU / CUDA_Device_Attribute_Generation

Automatically generate a C++ header file including Cuda device-specific parameters

C++ 3 Updated Jul 1, 2020

uber / aresdb

A GPU-powered real-time analytics storage and query engine.

Go 3,031 233 Updated Jul 13, 2024

yuhc / gpu-rodinia

Rodinia benchmark

C 169 87 Updated Apr 14, 2023

bytedance / effective_transformer

Running BERT without Padding

C++ 460 52 Updated Mar 18, 2022

virtual-kubelet / virtual-kubelet

Virtual Kubelet is an open source Kubernetes kubelet implementation.

Go 4,210 625 Updated Nov 11, 2024

apache / brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" mea…

C++ 16,537 3,970 Updated Nov 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bbshocking

Block or report bbshocking

Stars

nebuly-ai / optimate

NVIDIA / libnvidia-container

Uniswap / v3-core

PersiaML / PERSIA

EleutherAI / gpt-neox

F-Stack / f-stack

hpcaitech / ColossalAI

ParCoreLab / ComScribe

milvus-io / milvus

NVIDIA / nvcomp