Highlights
- Pro
Stars
rl
2 repositories
Train transformer language models with reinforcement learning.
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)