Fix key error in GRPOTrainer #1818

le-big-mac · 2025-02-24T16:33:32Z

Currently, when using GRPOTrainer, a KeyError occurs in the loss computation due to a missing [mode] key. This PR adds this mode key back.

danielhanchen · 2025-02-25T03:25:38Z

Thanks! Oh this is for the nightly release of TRL right?
Would it be possible to first check whether self._metrics["completion_length"] exists, and or self._metrics["train"] / "eval" exists to allow older versions of TRL to work? Also it looks like I might need to update the notebook logging

le-big-mac · 2025-02-25T14:21:22Z

I've added a check for "train" as a key in self._metrics, from looking at the history of TRL's GRPOTrainer this should be enough to distinguish between the versions. It used to be self._metrics = defaultdict(list), and now it's self._metrics = {"train: defaultdict(list), "eval": defaultdict(list)}, and I can't see evidence of a "train" key ever being used in the old version.

danielhanchen · 2025-02-25T23:22:32Z

Very good work thanks!

fix keyerror in GRPOTrainer

4ffddad

check for train in _metrics

efeb7ca

danielhanchen merged commit 2c0f501 into unslothai:main Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix key error in GRPOTrainer #1818

Fix key error in GRPOTrainer #1818

le-big-mac commented Feb 24, 2025

danielhanchen commented Feb 25, 2025

le-big-mac commented Feb 25, 2025

danielhanchen commented Feb 25, 2025

Fix key error in GRPOTrainer #1818

Fix key error in GRPOTrainer #1818

Conversation

le-big-mac commented Feb 24, 2025

danielhanchen commented Feb 25, 2025

le-big-mac commented Feb 25, 2025

danielhanchen commented Feb 25, 2025