We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent cc58ee3 commit a50b66bCopy full SHA for a50b66b
ML tips/NLP/README.md
@@ -899,4 +899,5 @@ https://github.com/OpenLMLab/MOSS-RLHF/tree/main
899
900
- reward model: https://github.com/OpenLMLab/MOSS-RLHF/blob/main/train_ppo.py#L113
901
- note need to train own reward model, but can use hf trainer or something, use above as guide
902
+ - see `reward_model` folder
903
- PPO train https://github.com/OpenLMLab/MOSS-RLHF/blob/main/run_en.sh
0 commit comments