Skip to content

Commit

Permalink
Fix typo re: what to do after training
Browse files Browse the repository at this point in the history
  • Loading branch information
Muennighoff authored Jul 2, 2024
1 parent 56976bb commit fdeb0f7
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -388,7 +388,7 @@ torchrun --nproc_per_node 1 \
--attn cccc
```

All arguments are explained in `training/arguments.py` or the [HF TrainingArguments documentation](https://hf.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) except for `nproc_per_node` which is the number of GPUs per node. For our actual training runs, we use accelerate to easily use multiple nodes and GPUs as well as slightly different settings (e.g. `--attn bbcc`). The scripts are all in `scripts/training`, for example `scripts/training/train_gritlm_8x7b.sh` was used for GritLM-8x7B. For models from the ablations, you can check their folder on the huggingface hub which contains a `training_args.bin` file with the arguments. You can also check all their arguments on the WandB: https://wandb.ai/muennighoff/gritlm. After training, you may first have to run `python scripts/reformat_statedict.py path_to_statedict` to remove the `model.` prefix from the checkpoints and then you can shard the checkpoint via `python scripts/shard.py path_to_model_folder` for easier usage.
All arguments are explained in `training/arguments.py` or the [HF TrainingArguments documentation](https://hf.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) except for `nproc_per_node` which is the number of GPUs per node. For our actual training runs, we use accelerate to easily use multiple nodes and GPUs as well as slightly different settings (e.g. `--attn bbcc`). The scripts are all in `scripts/training`, for example `scripts/training/train_gritlm_8x7b.sh` was used for GritLM-8x7B. For models from the ablations, you can check their folder on the huggingface hub which contains a `training_args.bin` file with the arguments. You can also check all their arguments on the WandB: https://wandb.ai/muennighoff/gritlm. After training, you may first have to run `python scripts/reformat_statedict.py path_to_statedict` to remove the `model.` prefix from the checkpoint, and then you can shard the checkpoint via `python scripts/shard.py path_to_model_folder` for easier usage.

#### Alignment

Expand Down

0 comments on commit fdeb0f7

Please sign in to comment.