Step Loss Trend Issue in Training

Hey do you guys seem to have this issue too? After plotting the step loss, I think there is no effective learning going on here; however, their winning rate is indeed increased when evaluating on 500 prompt examples from pickscorev2 dataset, matching to what they put in their paper. I am having the same issue with DSPO, which is a newer paper built upon this paper. 

<img width="986" alt="Image" src="https://github.com/user-attachments/assets/7a1ac533-e9ad-442e-accc-ab5037fd165a" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Step Loss Trend Issue in Training #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Step Loss Trend Issue in Training #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions