Skip to content

Commit

Permalink
update readme (#19)
Browse files Browse the repository at this point in the history
  • Loading branch information
baoleai committed Sep 12, 2024
1 parent e9a7cf8 commit a4f7db9
Show file tree
Hide file tree
Showing 6 changed files with 102 additions and 15 deletions.
75 changes: 61 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,84 @@
[![docs](https://img.shields.io/badge/docs-latest-brightgreen.svg)](https://torchacc.readthedocs.io/en/latest/)
[![CI](https://github.com/alibabapai/torchacc/actions/workflows/unit_test.yml/badge.svg)](https://github.com/alibabapai/torchacc/actions)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://github.com/alibabapai/torchacc/blob/main/LICENSE)

# TorchAcc

**TorchAcc** is a PyTorch distributed training acceleration framework provided by Alibaba Cloud's PAI platform.
**TorchAcc** is an AI training acceleration framework developed by Alibaba Clouds PAI.

TorchAcc leverages the work of the [PyTorch/XLA](https://github.com/pytorch/xla) to provide users with training acceleration capabilities. At the same time, we have conducted a considerable amount of targeted optimization based on GPU. TorchAcc offers better usability, superior performance, and richer functionality.
TorchAcc is built on [PyTorch/XLA](https://github.com/pytorch/xla) and provides an easy-to-use interface to accelerate the training of PyTorch models. At the same time, TorchAcc has implemented extensive optimizations for distributed training, memory management, and computation specifically for GPUs, ultimately achieving improved ease of use, better GPU training performance, and enhanced scalability for distributed training.

## Highlighted Features

The key features of TorchAcc:
## Highlighted Features

* Rich distributed Parallelism
* Data Parallelism
* Fully Sharded Data Parallelism
* Tensor Parallelism
* Pipeline Parallelism
* [Ulysess](https://arxiv.org/abs/2309.14509)
* [Ring Attention](https://arxiv.org/abs/2310.01889)
* Flash Sequence (Solution for Long Sequence)
* Context Parallelism
* [Ulysess](https://arxiv.org/abs/2309.14509)
* [Ring Attention](https://arxiv.org/abs/2310.01889)
* FlashSequence (2D Sequence Parallelism)
* Low Memory Cost
* High Performance
* Ease use
* Easy-to-use API

You can accelerate your transformer models with just a few lines of code using TorchAcc.

<p align="center">
<img width="80%" src=docs/figures/api.gif />
</p>


## Architecture Overview
The main goal of TorchAcc is to provide a high-performance AI training framework.
It utilizes IR abstractions at different layers and employs static graph compilation optimization like XLA and dynamic graph compilation optimization like BladeDISC, as well as distributed optimization techniques, to offer a comprehensive end-to-end optimization solution from the underlying operators to the upper-level models.


<p align="center">
<img width="80%" src=docs/figures/arch.png />
</p>


## Installation

### Build from source
1. Build
### Docker
```
python setup.py install
sudo docker run --gpus all --net host --ipc host --shm-size 10G -it --rm --cap-add=SYS_PTRACE registry.cn-hangzhou.aliyuncs.com/pai-dlc/acc:r2.3.0-cuda12.1.0-py3.10 bash
```

2. UT
```
sh tests/run_ut.sh
### Build from source

see the [contribution guide](docs/source/contributing.md).


## LLMs training examples

### Getting Started

We present a straightforward example for training a Transformer model using TorchAcc, illustrating the usage of the TorchAcc API.
You can quickly initiate training a Transformer model with TorchAcc by executing the following command:
``` shell
torchrun --nproc_per_node=4 benchmarks/transformer.py --bf16 --acc --disable_loss_print --fsdp_size=4 --gc
```

### Utilizing HuggingFace Transformers

If you are familiar with HuggingFace Transformers's Trainer, you can easily accelerate a Transformer model using TorchAcc, see the [huggingface transformers](docs/source/tutorials/hf_transformers.md)

### LLMs training acceleration with FlashModels

If you want to try the latest features of Torchacc or want to use the TorchAcc interface more flexibly for model acceleration, you can use our LLM acceleration library, FlashModels. It integrates various distributed implementations of commonly used open-source LLM models and provides a wealth of examples.

https://github.com/AlibabaPAI/FlashModels

### SFT using modelscope/swift
coming soon..

## Contributing
see the [contribution guide](docs/source/contributing.md).


## License
[Apache License 2.0](LICENSE)
Binary file added docs/figures/api.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/figures/arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
34 changes: 34 additions & 0 deletions docs/source/contributing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Contribute To TorchAcc


TorchAcc is built on top of PyTorch/XLA, and it requires a specific version of PyTorch/XLA to
to ensure GPU compatibility and performance.
We highly recommend you to use our prebuilt Docker image to start your development work.

## Building from source
If you want to build from source, you first need to build PyTorch and torch_xla from source.

1. build PyTorch
```shell
git clone --recursive -b v2.3.0 git@github.com:AlibabaPAI/pytorch.git
cd pytorch
python setup.py develop
```


2. build torch_xla
```shell
git clone --recursive -b acc git@github.com:AlibabaPAI/xla.git
cd xla
USE_CUDA=1 XLA_CUDA=1 python setup.py develop
```

3. build torchacc
```shell
python setup.py develop
```

4. UT
```
sh tests/run_ut.sh
```
6 changes: 6 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,12 @@ Welcome to PAI-TorchAcc's documentation!

apis/modules

.. toctree::
:maxdepth: 2
:caption: CONTRIBUTING

contributing

Indices and tables
==================

Expand Down
2 changes: 1 addition & 1 deletion docs/source/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
It is recommended to use the existing release image directly. The image address is:

```bash
registry.<region>.aliyuncs.com/pai-dlc/acc:r2.3.0-cuda12.1.0-py3.10-nightly
registry.<region>.aliyuncs.com/pai-dlc/acc:r2.3.0-cuda12.1.0-py3.10
```

Replace `<region>` with one of the following as needed:
Expand Down

0 comments on commit a4f7db9

Please sign in to comment.