feat(trainer): add logging of extra reward metrics #4698

yurekami · 2025-12-27T22:58:21Z

Summary

Add logging of extra/intermediate rewards to training metrics
This improves training monitoring by providing visibility into component reward values

Motivation

As discussed in #2279, logging extra rewards like format_reward and acc_reward is useful for training monitoring. This enables users to track intermediate reward components in addition to the final combined score.

Changes

Added logging of extra reward metrics after the actor update:

# Log extra reward metrics (e.g., format_reward, acc_reward) for training monitoring
if reward_extra_infos_dict:
    for key, values in reward_extra_infos_dict.items():
        if key != "score" and len(values) > 0:
            metrics[f"critic/rewards/{key}"] = np.mean(values)

Usage

Reward functions can return extra info via the reward_extra_info dict:

return {
    'score': final_reward,
    'format_reward': format_score,
    'acc_reward': accuracy_score,
    # ... other intermediate rewards
}

These will be logged as:

critic/rewards/format_reward
critic/rewards/acc_reward
etc.

Closes #4545

Test plan

Verify extra rewards appear in wandb/tensorboard when using a reward function with extra info
Confirm metrics are logged correctly with multiple extra reward keys

🤖 Generated with Claude Code

Add logging of extra/intermediate rewards (e.g., format_reward, acc_reward) to the training metrics. This improves training monitoring by providing visibility into component reward values in addition to the final combined score. The extra rewards are logged under the "critic/rewards/{key}" namespace, matching the pattern used for other critic metrics. Reward functions can return extra info via the "reward_extra_info" dict: {'score': reward, 'format_reward': value_1, 'acc_reward': value_2} Closes volcengine#4545 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request adds logging for extra reward metrics from custom reward functions. The change is straightforward, but I've identified a potential robustness issue where non-numeric extra reward information could crash the training loop. I've suggested a fix to handle this gracefully by adding a try-except block.

gemini-code-assist · 2025-12-27T22:59:38Z

verl/trainer/ppo/ray_trainer.py

+                    if reward_extra_infos_dict:
+                        for key, values in reward_extra_infos_dict.items():
+                            if key != "score" and len(values) > 0:
+                                metrics[f"critic/rewards/{key}"] = np.mean(values)


The current implementation assumes that all values in reward_extra_infos_dict are numeric and can be averaged by np.mean. However, custom reward functions can return non-numeric extra information (e.g., strings, dictionaries), which would cause np.mean to raise a TypeError or ValueError, crashing the training loop. To make this more robust, it's better to wrap the np.mean call in a try-except block to gracefully handle non-numeric metrics.

Suggested change

metrics[f"critic/rewards/{key}"] = np.mean(values)

try:

metrics[f"critic/rewards/{key}"] = np.mean(values)

except (TypeError, ValueError):

# Not all extra reward info may be numeric, so we skip what can't be averaged.

pass

Could you rename this to training/reward/xxx?

yurekami requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners December 27, 2025 22:58

gemini-code-assist bot reviewed Dec 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(trainer): add logging of extra reward metrics #4698

feat(trainer): add logging of extra reward metrics #4698

yurekami commented Dec 27, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 27, 2025

Uh oh!

vermouth1992 Dec 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(trainer): add logging of extra reward metrics #4698

Are you sure you want to change the base?

feat(trainer): add logging of extra reward metrics #4698

Conversation

yurekami commented Dec 27, 2025

Summary

Motivation

Changes

Usage

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 27, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 Dec 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants