Skip to content

Question about Training Epochs for Harder Tasks (Countdown 5 and 3SAT-9) in Diffusion Baselines #9

@xiangxdai

Description

@xiangxdai

Hi HKUNLP team,

Thank you for the great work on MGDM! I'm particularly interested in the diffusion baselines using the GPT-2 transformer architecture for tasks like Countdown and 3SAT, as mentioned in the paper/experiments.

In the setup description, the training parameters for all diffusion baselines match those of MGDM: 600 epochs, learning rate 3e-4, and batch size 1024. However, for the more challenging tasks like Countdown 5 and 3SAT-9, I'm wondering if 600 epochs is sufficient for convergence, especially since these require extensive planning and multi-step reasoning.

From my own experiments/reproduction attempts, it seems like Countdown 5 in particular underperforms or doesn't fully converge after 600 epochs (e.g., accuracy plateaus below expected levels). Could you confirm:

  • Is 600 epochs empirically sufficient for these harder tasks in the baselines, or did you observe longer training being beneficial?
  • If more epochs are needed, what would be a recommended extension (e.g., 1000+ epochs) without risking overfitting?
  • Any other hyperparameters (e.g., scheduler adjustments) that helped stabilize training for Countdown 5 or 3SAT-9?

This would be super helpful for reproducing your results accurately. Happy to share more details on my setup if needed!

Best regards,
Xiangxiang

MSRA & CUHK

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions