Distributed training (multi-node) of a Transformer model
-
Updated
Apr 10, 2024 - Python
Distributed training (multi-node) of a Transformer model
Messaging and state layer for distributed serverless applications
Summary of call graphs and data structures of NVIDIA Collective Communication Library (NCCL)
Blink+: Increase GPU group bandwidth by utilizing across tenant NVLink.
collectives library for upc++
Interactive web visualization for understanding collective communication algorithms (as used in NCCL, RCCL, MPI). Learn how AllReduce, Broadcast, Reduce, AllGather and more work step by step.
TileXR (eXtreme Rendezvous for Asynchronous Tile Communication) is a data-centric asynchronous communication runtime for Huawei Ascend NPUs. TileXR is an AI-native designed communication lib.
Simple quick test to benchmark your pytorch + nccl/ncclx setup
This repository contains simple programs of MPI_Bcast, MPI_Reduce, MPI_Scatter and MPI_Gather. Download the repository and test your self.
Modelling of MPI collective operations latencies: Broadcast and Reduce operations. UniTS, SDIC, 2023-2024
A reduction algorithm for MPI using only peer to peer communication
HPC course practice assignments for parallel-programming
Summary of call graphs and data structures of collective communication plugin in NVIDIA TensorRT-LLM
MPI laboratory project demonstrating collective communication primitives to perform distributed numerical computations on a vector. Implements broadcast, scatter, gather, reduce, and scan operations while managing vector segments across multiple processes (Introduction to Parallel Computing, UNIWA).
Audit GPU cluster communication schedules from NCCL logs. Zero dependencies. CI-ready.
Develop high-performance parallel applications in C++ using the Partitioned Global Address Space model and asynchronous communication primitives.
Add a description, image, and links to the collective-communication topic page so that developers can more easily learn about it.
To associate your repository with the collective-communication topic, visit your repo's landing page and select "manage topics."