Question about Training Epochs for Harder Tasks (Countdown 5 and 3SAT-9) in Diffusion Baselines

Hi HKUNLP team,

Thank you for the great work on MGDM! I'm particularly interested in the diffusion baselines using the GPT-2 transformer architecture for tasks like Countdown and 3SAT, as mentioned in the paper/experiments.

In the setup description, the training parameters for all diffusion baselines match those of MGDM: 600 epochs, learning rate 3e-4, and batch size 1024. However, for the more challenging tasks like Countdown 5 and 3SAT-9, I'm wondering if 600 epochs is sufficient for convergence, especially since these require extensive planning and multi-step reasoning.

From my own experiments/reproduction attempts, it seems like Countdown 5 in particular underperforms or doesn't fully converge after 600 epochs (e.g., accuracy plateaus below expected levels). Could you confirm:
- Is 600 epochs empirically sufficient for these harder tasks in the baselines, or did you observe longer training being beneficial?
- If more epochs are needed, what would be a recommended extension (e.g., 1000+ epochs) without risking overfitting?
- Any other hyperparameters (e.g., scheduler adjustments) that helped stabilize training for Countdown 5 or 3SAT-9?

This would be super helpful for reproducing your results accurately. Happy to share more details on my setup if needed!

Best regards,  
Xiangxiang

MSRA & CUHK

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Training Epochs for Harder Tasks (Countdown 5 and 3SAT-9) in Diffusion Baselines #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about Training Epochs for Harder Tasks (Countdown 5 and 3SAT-9) in Diffusion Baselines #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions