Skip to content

Commit

Permalink
chore: update readme fused optimizer link
Browse files Browse the repository at this point in the history
  • Loading branch information
NOBLES5E authored Nov 4, 2021
1 parent 7838bec commit 8a33ea1
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Bagua is a deep learning training acceleration framework for PyTorch developed b
- Asynchronous Communication (e.g. [Async Model Average](https://tutorials.baguasys.com/algorithms/async-model-average))
- [**TCP Communication Acceleration (Bagua-Net)**](https://tutorials.baguasys.com/more-optimizations/bagua-net): Bagua-Net is a low level communication acceleration feature provided by Bagua. It can greatly improve the throughput of AllReduce on TCP network. You can enable Bagua-Net optimization on any distributed training job that uses NCCL to do GPU communication (this includes PyTorch-DDP, Horovod, DeepSpeed, and more).
- [**Performance Autotuning**](https://tutorials.baguasys.com/performance-autotuning/): Bagua can automatically tune system parameters to achieve the highest throughput.
- [**Generic Fused Optimizer**](https://bagua.readthedocs.io/en/latest/autoapi/bagua/torch_api/contrib/index.html#bagua.torch_api.contrib.fuse_optimizer): Bagua provides generic fused optimizer which improve the performance of optimizers by fusing the optimizer `.step()` operation on multiple layers. It can be applied to arbitrary PyTorch optimizer, in contrast to [NVIDIA Apex](https://nvidia.github.io/apex/optimizers.html)'s approach, where only some specific optimizers are implemented.
- [**Generic Fused Optimizer**](https://tutorials.baguasys.com/more-optimizations/generic-fused-optimizer): Bagua provides generic fused optimizer which improve the performance of optimizers by fusing the optimizer `.step()` operation on multiple layers. It can be applied to arbitrary PyTorch optimizer, in contrast to [NVIDIA Apex](https://nvidia.github.io/apex/optimizers.html)'s approach, where only some specific optimizers are implemented.
- [**Load Balanced Data Loader**](https://bagua.readthedocs.io/en/latest/autoapi/bagua/torch_api/contrib/load_balancing_data_loader/index.html): When the computation complexity of samples in training data are different, for example in NLP and speech tasks, where each sample have different lengths, distributed training throughput can be greatly improved by using Bagua's load balanced data loader, which distributes samples in a way that each worker's workload are similar.

Its effectiveness has been evaluated in various scenarios, including VGG and ResNet on ImageNet, BERT Large and many industrial applications at Kuaishou.
Expand Down

0 comments on commit 8a33ea1

Please sign in to comment.