Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GRPO] Reorganize letter counting configs #1570

Merged
merged 10 commits into from
Mar 26, 2025
Merged

Conversation

wizeng23
Copy link
Contributor

Description

  • Changed directory structure in preparation for future PR to add letter counting evaluation.
  • Changed model for GRPO letter counting to Deepseek R1 distilled Qwen 1.5B, as reasoning models should have better performance.
  • Removed some unnecessary shard_for_eval params for smaller models.
  • Fixed broken documentation link.

Related issues

Towards OPE-1122

Before submitting

  • This PR only changes documentation. (You can ignore the following checks in that case)
  • Did you read the contributor guideline Pull Request guidelines?
  • Did you link the issue(s) related to this PR in the section above?
  • Did you add / update tests where needed?

@@ -13,7 +13,7 @@
# - Other training configs: configs/**/pretraining/, configs/**/sft/, configs/**/dpo/

model:
model_name: "Qwen/Qwen2-0.5B-Instruct"
model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this model train w/o errors ? How much slower it is compared to 0.5B model ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an unrelated error for model training, which I'm pretty sure didn't exist when i submitted my last PR. After fixing that, it trains. It's a bit slower, at 10 min for 5 steps instead of 6 min. The training speed seems rather variable though; for the 1.5B model, it goes from 5 steps after 10 min to 200 steps after 43 min. It's not just the first step that's slow (like with compilation), but the first couple.

@wizeng23 wizeng23 merged commit a550278 into main Mar 26, 2025
2 checks passed
@wizeng23 wizeng23 deleted the wizeng/o1122-refactor-configs branch March 26, 2025 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants