Skip to content

Commit

Permalink
document how training may slow down.
Browse files Browse the repository at this point in the history
  • Loading branch information
lxuechen committed Mar 16, 2023
1 parent 61a3b43 commit 7f08532
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ This is the repo for the Stanford Alpaca project, which aims to build and share
- A [**web demo**](https://crfm.stanford.edu/alpaca/) to interact with our Alpaca model
- The [52K data](#data-release) used for fine-tuning the model
- The code for [generating the data](#data-generation-process)
- The code for [fine-tuning the model](#fine-tuning)

## Overview

Expand Down Expand Up @@ -139,6 +140,15 @@ torchrun --nproc_per_node=4 --master_port=<your_random_port> train.py \
--tf32 True
```

### Warning
`fsdp_transformer_layer_cls_to_wrap` must be set to the name of the specific decoder layer.
The LLaMA Hugging Face PR is not stable.
Earlier commits used the name `LLaMADecoderLayer` for their decoder layer (the commit hash our code is based on this).
More recent commits use `LlamaDecoderLayer` (notice the small case difference).
Not setting `fsdp_transformer_layer_cls_to_wrap` to the correct name will lead to drastic slowdowns in training.

### Side notes

The same script also works for OPT fine-tuning. Here's an example for fine-tuning OPT-6.7B

```bash
Expand Down

0 comments on commit 7f08532

Please sign in to comment.