Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
reinforcement-learning
transformers
transformer
safety
llama
gpt
datasets
beaver
alpaca
ai-safety
safe-reinforcement-learning
vicuna
deepspeed
large-language-models
llm
llms
rlhf
reinforcement-learning-from-human-feedback
safe-rlhf
safe-reinforcement-learning-from-human-feedback
-
Updated
Jun 13, 2024 - Python