No tests exist for the most complex parts of the GRPO training pipeline:
_training_step
_compute_rollout_loss
_compute_ref_log_probs
_make_agent_fn
collect_group
Need mock-based tests that provide fake rollouts and verify:
- Gradient flow
- Loss computation correctness
- Edge cases (empty rollouts, zero advantages, all-same rewards)