Open
Description
Hi, thanks for the impressive work on Open R1!
I had a question about the data processing for SFT. From what I understand, the training uses the default dataset. Since this dataset contains multiple responses per question, I'm curious how the final SFT training data was constructed.
Were all the responses used during training, or was there a filtering step to select only the correct or high-quality answers? Any clarification on this would be greatly appreciated.

Metadata
Metadata
Assignees
Labels
No labels