reward.py has only binary_task_success with no extensibility. RL training use cases need custom per-task verifiers and reward composition.
Current reward is hardcoded in rollout_collector.py:140.
Proposed design:
- Implement a reward function protocol matching TRL's
reward_funcs pattern (list of callables)
- Support a
TaskVerifierRegistry for registering task-specific verification functions