RL training environments with verifiable rewards for coding agents. Works with TRL, Unsloth, verl, OpenRLHF.
python machine-learning reinforcement-learning deep-learning sandbox evaluation rl code-execution ai-agents daytona llm unsloth coding-agents grpo verifiable-rewards openrlhf reward-function grpo-training
-
Updated
Apr 1, 2026 - Python