Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PiPPy README typos for inference #834

Merged
merged 1 commit into from
Jul 6, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions examples/inference/README.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,27 @@
# PiPPy (Pipline Parallelism for PyTorch) Distributed Inference for Large Models

PiPPy helps to run very large models for inference by splitting the model into mutliple stages running on multiple GPUs.
PiPPy make this easier by providing a auto split API that automates this process for user.
PiPPy make this easier by providing an auto split API that automates this process for user.

## How It Works

PiPPy splits your model into multiple stages, each stage loaded on one gpu then the input batch will be furhter divided into micro-batches and run through the splits from
rank0..rankN. Results are being returned to rank0 as its runing the PipelineDriver. Please read more on pipleines [here](https://github.com/pytorch/tau/blob/main/README.md)
PiPPy splits your model into multiple stages, each stage loaded on one gpu then the input batch will be further divided into micro-batches and run through the splits from
rank0..rankN. Results are returned to rank0 as rank 0 is running the PipelineDriver. Please read more on pipleines [here](https://github.com/pytorch/tau/blob/main/README.md)

The flowchart below helps to visualize the process in high level as well.

<img src="https://user-images.githubusercontent.com/9162336/207237303-86dc02fe-dae0-4335-8d23-c56d31ecdb87.png" alt="drawing" width="400"/>

## PiPPy Supports Arbitary Model Partitioning

Unlike most of the available solutions that they need to know the model architecture beforehand, PiPPy supports arbitary PyTorch models.
Unlike most of the available solutions that need to know the model architecture beforehand, PiPPy supports arbitary PyTorch models.
* PiPPy supports both manual splitting and auto split.
* Auto split uses `split_policy` and support both `equal_size` and `threshod` policies, the name are self-explanatory.
* PiPPy use FX to trace and split the model.

## Settings To Care About

* **world_size** specifies your availble number of gpus for paritioning your model
* **world_size** specifies your availble number of gpus for partitioning your model

* **split_policy** it can be either `equal_size`, `split_into_equal_size(number_of_workers)` or `threshod`, `split_on_size_threshold(#some number)`

Expand Down Expand Up @@ -151,4 +151,4 @@ git clone https://huggingface.co/bigscience/bloom-7b1
torchrun --nproc_per_node 4 hf_generate.py --world_size 4 --model_name ./bloom-7b1 --index_filename bloom-7b1/pytorch_model.bin.index.json
```

In this case, each rank will only load a part of the model.
In this case, each rank will only load a part of the model.