Skip to content

Issue Reproducing s1.1-32B Training Loss (Observed vs. WandB) #108

Open
@dzh19990407

Description

@dzh19990407

Introduction:
First, thank you for the excellent paper and codebase.

Goal:
I am attempting to reproduce the training results for the s1.1-32B experiment as reported.

My Setup:

  • Model: Qwen/Qwen2.5-32B-Instruct
  • Key Parameter: block_size=20000
  • Hardware: 16 x A100 80G GPUs
  • I used train/sft_multinode.sh for training and did not modify any other code.

Problem:
The training loss curve I am observing during my retraining run does not match the curve provided in your primary WandB report for this experiment.

  • Expected Behavior (based on WandB): Training loss should decrease steadily and reach a value around 0.4.
  • Actual Behavior: My training loss curve follows a trajectory more similar to the example curve shown in the paper's Figure 9. The loss seems to decrease in steps and eventually settles at a lower value (i.e., lower than 0.1).

Request:
Could you clarify if there are known differences in configuration or setup between the run shown in the WandB report (achieving ~0.4 loss) and the conditions that might lead to the curve shape shown in the appendix? Any guidance on replicating the ~0.4 loss result would be appreciated.

Thank you for your time and assistance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions