Summary
I want to propose RFC 006 to standardize multi-component rewards while keeping the existing scalar reward contract intact.
Problem
Today we optimize on a single scalar reward, which works for training but makes reward design/debugging harder in complex agentic tasks (e.g., coding with safety/process constraints). Teams can manually encode these pieces into one score, but there is no common schema to inspect and compare components across environments.
Proposal
Introduce a standard component representation (e.g., success/progress/penalty/shaping/binary) emitted in observation metadata, while preserving observation.reward as the scalar optimization target.
Why now
This is a core reward-contract extension that touches environment authoring, observability, and training diagnostics. It should be aligned via RFC before implementation.
Draft RFC
A full draft is prepared in rfcs/006-multi-component-rewards.md and is submitted via PR for review.
Related RFCs
- RFC 001 (abstractions / boundaries)
- RFC 002 (environment-computed rewards)
- RFC 004 (rubrics)
- RFC 005 (harness boundary and reward opacity)
Summary
I want to propose RFC 006 to standardize multi-component rewards while keeping the existing scalar reward contract intact.
Problem
Today we optimize on a single scalar reward, which works for training but makes reward design/debugging harder in complex agentic tasks (e.g., coding with safety/process constraints). Teams can manually encode these pieces into one score, but there is no common schema to inspect and compare components across environments.
Proposal
Introduce a standard component representation (e.g., success/progress/penalty/shaping/binary) emitted in observation metadata, while preserving
observation.rewardas the scalar optimization target.Why now
This is a core reward-contract extension that touches environment authoring, observability, and training diagnostics. It should be aligned via RFC before implementation.
Draft RFC
A full draft is prepared in
rfcs/006-multi-component-rewards.mdand is submitted via PR for review.Related RFCs