Question: How is the SFT training data processed in Open R1?

Hi, thanks for the impressive work on Open R1!

I had a question about the data processing for SFT. From what I understand, the training uses the default dataset. Since this dataset contains multiple responses per question, I'm curious how the final SFT training data was constructed.

Were all the responses used during training, or was there a filtering step to select only the correct or high-quality answers? Any clarification on this would be greatly appreciated.

<img width="658" alt="Image" src="https://github.com/user-attachments/assets/01475c9a-fbc0-43c7-955d-917121c2dcfb" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: How is the SFT training data processed in Open R1? #671

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question: How is the SFT training data processed in Open R1? #671

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions