Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix key error in GRPOTrainer #1818

Merged
merged 2 commits into from
Feb 25, 2025
Merged

Conversation

le-big-mac
Copy link
Contributor

Fixes #1807

Currently, when using GRPOTrainer, a KeyError occurs in the loss computation due to a missing [mode] key. This PR adds this mode key back.

@danielhanchen
Copy link
Contributor

Thanks! Oh this is for the nightly release of TRL right?
Would it be possible to first check whether self._metrics["completion_length"] exists, and or self._metrics["train"] / "eval" exists to allow older versions of TRL to work? Also it looks like I might need to update the notebook logging

@le-big-mac
Copy link
Contributor Author

I've added a check for "train" as a key in self._metrics, from looking at the history of TRL's GRPOTrainer this should be enough to distinguish between the versions. It used to be self._metrics = defaultdict(list), and now it's self._metrics = {"train: defaultdict(list), "eval": defaultdict(list)}, and I can't see evidence of a "train" key ever being used in the old version.

@danielhanchen
Copy link
Contributor

Very good work thanks!

@danielhanchen danielhanchen merged commit 2c0f501 into unslothai:main Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KeyError: 'completion_length' in GRPO trainer
2 participants