-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
6 changed files
with
102 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,84 @@ | ||
[![docs](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://torchacc.readthedocs.io/en/latest/) | ||
[![CI](https://github.com/alibabapai/torchacc/actions/workflows/unit_test.yml/badge.svg)](https://github.com/alibabapai/torchacc/actions) | ||
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/alibabapai/torchacc/blob/main/LICENSE) | ||
|
||
# TorchAcc | ||
|
||
**TorchAcc** is a PyTorch distributed training acceleration framework provided by Alibaba Cloud's PAI platform. | ||
**TorchAcc** is an AI training acceleration framework developed by Alibaba Cloud’s PAI. | ||
|
||
TorchAcc leverages the work of the [PyTorch/XLA](https://github.com/pytorch/xla) to provide users with training acceleration capabilities. At the same time, we have conducted a considerable amount of targeted optimization based on GPU. TorchAcc offers better usability, superior performance, and richer functionality. | ||
TorchAcc is built on [PyTorch/XLA](https://github.com/pytorch/xla) and provides an easy-to-use interface to accelerate the training of PyTorch models. At the same time, TorchAcc has implemented extensive optimizations for distributed training, memory management, and computation specifically for GPUs, ultimately achieving improved ease of use, better GPU training performance, and enhanced scalability for distributed training. | ||
|
||
## Highlighted Features | ||
|
||
The key features of TorchAcc: | ||
## Highlighted Features | ||
|
||
* Rich distributed Parallelism | ||
* Data Parallelism | ||
* Fully Sharded Data Parallelism | ||
* Tensor Parallelism | ||
* Pipeline Parallelism | ||
* [Ulysess](https://arxiv.org/abs/2309.14509) | ||
* [Ring Attention](https://arxiv.org/abs/2310.01889) | ||
* Flash Sequence (Solution for Long Sequence) | ||
* Context Parallelism | ||
* [Ulysess](https://arxiv.org/abs/2309.14509) | ||
* [Ring Attention](https://arxiv.org/abs/2310.01889) | ||
* FlashSequence (2D Sequence Parallelism) | ||
* Low Memory Cost | ||
* High Performance | ||
* Ease use | ||
* Easy-to-use API | ||
|
||
You can accelerate your transformer models with just a few lines of code using TorchAcc. | ||
|
||
<p align="center"> | ||
<img width="80%" src=docs/figures/api.gif /> | ||
</p> | ||
|
||
|
||
## Architecture Overview | ||
The main goal of TorchAcc is to provide a high-performance AI training framework. | ||
It utilizes IR abstractions at different layers and employs static graph compilation optimization like XLA and dynamic graph compilation optimization like BladeDISC, as well as distributed optimization techniques, to offer a comprehensive end-to-end optimization solution from the underlying operators to the upper-level models. | ||
|
||
|
||
<p align="center"> | ||
<img width="80%" src=docs/figures/arch.png /> | ||
</p> | ||
|
||
|
||
## Installation | ||
|
||
### Build from source | ||
1. Build | ||
### Docker | ||
``` | ||
python setup.py install | ||
sudo docker run --gpus all --net host --ipc host --shm-size 10G -it --rm --cap-add=SYS_PTRACE registry.cn-hangzhou.aliyuncs.com/pai-dlc/acc:r2.3.0-cuda12.1.0-py3.10 bash | ||
``` | ||
|
||
2. UT | ||
``` | ||
sh tests/run_ut.sh | ||
### Build from source | ||
|
||
see the [contribution guide](docs/source/contributing.md). | ||
|
||
|
||
## LLMs training examples | ||
|
||
### Getting Started | ||
|
||
We present a straightforward example for training a Transformer model using TorchAcc, illustrating the usage of the TorchAcc API. | ||
You can quickly initiate training a Transformer model with TorchAcc by executing the following command: | ||
``` shell | ||
torchrun --nproc_per_node=4 benchmarks/transformer.py --bf16 --acc --disable_loss_print --fsdp_size=4 --gc | ||
``` | ||
|
||
### Utilizing HuggingFace Transformers | ||
|
||
If you are familiar with HuggingFace Transformers's Trainer, you can easily accelerate a Transformer model using TorchAcc, see the [huggingface transformers](docs/source/tutorials/hf_transformers.md) | ||
|
||
### LLMs training acceleration with FlashModels | ||
|
||
If you want to try the latest features of Torchacc or want to use the TorchAcc interface more flexibly for model acceleration, you can use our LLM acceleration library, FlashModels. It integrates various distributed implementations of commonly used open-source LLM models and provides a wealth of examples. | ||
|
||
https://github.com/AlibabaPAI/FlashModels | ||
|
||
### SFT using modelscope/swift | ||
coming soon.. | ||
|
||
## Contributing | ||
see the [contribution guide](docs/source/contributing.md). | ||
|
||
|
||
## License | ||
[Apache License 2.0](LICENSE) |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# Contribute To TorchAcc | ||
|
||
|
||
TorchAcc is built on top of PyTorch/XLA, and it requires a specific version of PyTorch/XLA to | ||
to ensure GPU compatibility and performance. | ||
We highly recommend you to use our prebuilt Docker image to start your development work. | ||
|
||
## Building from source | ||
If you want to build from source, you first need to build PyTorch and torch_xla from source. | ||
|
||
1. build PyTorch | ||
```shell | ||
git clone --recursive -b v2.3.0 git@github.com:AlibabaPAI/pytorch.git | ||
cd pytorch | ||
python setup.py develop | ||
``` | ||
|
||
|
||
2. build torch_xla | ||
```shell | ||
git clone --recursive -b acc git@github.com:AlibabaPAI/xla.git | ||
cd xla | ||
USE_CUDA=1 XLA_CUDA=1 python setup.py develop | ||
``` | ||
|
||
3. build torchacc | ||
```shell | ||
python setup.py develop | ||
``` | ||
|
||
4. UT | ||
``` | ||
sh tests/run_ut.sh | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters