#

distributed-deep-learning

Here are 18 public repositories matching this topic...

dkeras

dkeras-project / dkeras

Distributed Keras Engine, Make Keras faster with only one line of code.

Updated Oct 3, 2019
Python

ParCIS / Chimera

Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.

transformers distributed-deep-learning pipeline-parallelism

Updated Mar 20, 2025
Python

sensAI

GuanhuaWang / sensAI

sensAI: ConvNets Decomposition via Class Parallelism for Fast Inference on Live Data

distributed-systems machine-learning deep-neural-networks deep-learning vgg imagenet resnet cifar10 sysml distributed-deep-learning cifar100 cifar-10 distributed-machine-learning cnn-classification cifar-100 mobilenet-v2 shufflenet-v2 mlsys imagenet1k

Updated Jul 25, 2024
Python

rkhan055 / SHADE

SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

caching machine-learning deep-learning storage distributed-deep-learning

Updated Mar 1, 2023
Python

gsyang33 / Driple

🚨 Prediction of the Resource Consumption of Distributed Deep Learning Systems

machine-learning deep-learning distributed-deep-learning

Updated Feb 6, 2023
Python

ParCIS / Ok-Topk

Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved theoretically and empirically.

distributed-deep-learning sparse-allreduce topk-sgd

Updated Dec 10, 2022
Python

ravenprotocol / ravnest

Decentralized Asynchronous Training on Heterogeneous Devices

machine-learning artificial-intelligence neural-networks distributed-deep-learning

Updated Nov 11, 2025
Python

Shigangli / eager-SGD

Eager-SGD is a decentralized asynchronous SGD. It utilizes novel partial collectives operations to accumulate the gradients across all the processes.

distributed-deep-learning partial-allreduce gradient-averaging

Updated Nov 18, 2021
Python

Shigangli / WAGMA-SGD

WAGMA-SGD is a decentralized asynchronous SGD based on wait-avoiding group model averaging. The synchronization is relaxed by making the collectives externally-triggerable, namely, a collective can be initiated without requiring that all the processes enter it. It partially reduces the data within non-overlapping groups of process, improving the…

distributed-deep-learning model-averaging partial-allreduce

Updated Jun 30, 2021
Python

lancelee82 / necklace

Distributed deep learning framework based on pytorch/numba/nccl and zeromq.

deep-learning mxnet pytorch distributed numba zerorpc distributed-deep-learning distributed-training nccl

Updated Aug 10, 2023
Python

StefanoFioravanzo / distributed-deeplearning-kubernetes

Collection of resources for automatic deployment of distributed deep learning jobs on a Kubernetes cluster

mxnet tensorflow kubernetes-operator distributed-deep-learning azure-kubernetes-service

Updated Sep 18, 2018
Python

veritas9872 / Horovod-Pytorch-Tutorial

Horovod Tutorial for Pytorch using NVIDIA-Docker.

docker pytorch nvidia-docker distributed-deep-learning horovod horovod-pytorch-tutorial horovod-tutorial horovod-pytorch horovod-example horovod-pytorch-example

Updated Feb 14, 2020
Python

sotheanithsok / Image-Recognition-using-Distributed-ResNet-Model

An implementation of a distributed ResNet model for classifying CIFAR-10 and MNIST datasets.

python tensorflow mnist cifar10 distributed-deep-learning horovod

Updated Jun 6, 2022
Python

hyunnnchoi / google-t5-fsdp-kubeflow

A foundational repository for setting up distributed training jobs using Kubeflow and PyTorch FSDP.

pytorch distributed-deep-learning kubeflow fsdp

Updated Jan 7, 2025
Python

aliebayani / Unified-AI-Aware-Scheduler

Auto-Tuned Scheduler Prototype for Heterogeneous GPU Clusters

python distributed-computing cloud-computing distributed-deep-learning distributed-machine-learning computer-networks

Updated Nov 30, 2025
Python

sqaz91819 / Blockchain-NAS

A blockchain based neural architecture search project.

blockchain distributed-deep-learning neural-architecture-search

Updated Jun 15, 2021
Python

siddhanthiyer-99 / Distributed-Training-of-GANs

Implemented training strategies to help improve bottlenecks and to improve the training speed while maintaining the quality of our GANs.

python deep-learning tensorflow pytorch distributed-deep-learning

Updated Jul 31, 2023
Python

ch3njust1n / smpl

Simultaneous Multi-Party Learning Framework

distributed-systems deep-neural-networks deep-learning artificial-intelligence sgd artificial-neural-networks gradient-descent evolutionary-algorithm hypergraph metaheuristic distributed-deep-learning hgsgd hypergraph-sgd asynchronous-sgd

Updated Sep 21, 2018
Python

Improve this page

Add a description, image, and links to the distributed-deep-learning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the distributed-deep-learning topic, visit your repo's landing page and select "manage topics."