Fix typo re: what to do after training

kp-forks · Jul 2, 2024 · fdeb0f7 · fdeb0f7
1 parent 56976bb
commit fdeb0f7
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -388,7 +388,7 @@ torchrun --nproc_per_node 1 \
 --attn cccc
 ```
 
-All arguments are explained in `training/arguments.py` or the [HF TrainingArguments documentation](https://hf.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) except for `nproc_per_node` which is the number of GPUs per node. For our actual training runs, we use accelerate to easily use multiple nodes and GPUs as well as slightly different settings (e.g. `--attn bbcc`). The scripts are all in `scripts/training`, for example `scripts/training/train_gritlm_8x7b.sh` was used for GritLM-8x7B. For models from the ablations, you can check their folder on the huggingface hub which contains a `training_args.bin` file with the arguments. You can also check all their arguments on the WandB: https://wandb.ai/muennighoff/gritlm. After training, you may first have to run `python scripts/reformat_statedict.py path_to_statedict` to remove the `model.` prefix from the checkpoints and then you can shard the checkpoint via `python scripts/shard.py path_to_model_folder` for easier usage.
+All arguments are explained in `training/arguments.py` or the [HF TrainingArguments documentation](https://hf.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) except for `nproc_per_node` which is the number of GPUs per node. For our actual training runs, we use accelerate to easily use multiple nodes and GPUs as well as slightly different settings (e.g. `--attn bbcc`). The scripts are all in `scripts/training`, for example `scripts/training/train_gritlm_8x7b.sh` was used for GritLM-8x7B. For models from the ablations, you can check their folder on the huggingface hub which contains a `training_args.bin` file with the arguments. You can also check all their arguments on the WandB: https://wandb.ai/muennighoff/gritlm. After training, you may first have to run `python scripts/reformat_statedict.py path_to_statedict` to remove the `model.` prefix from the checkpoint, and then you can shard the checkpoint via `python scripts/shard.py path_to_model_folder` for easier usage.
 
 #### Alignment