-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Hi HKUNLP team,
Thank you for the great work on MGDM! I'm particularly interested in the diffusion baselines using the GPT-2 transformer architecture for tasks like Countdown and 3SAT, as mentioned in the paper/experiments.
In the setup description, the training parameters for all diffusion baselines match those of MGDM: 600 epochs, learning rate 3e-4, and batch size 1024. However, for the more challenging tasks like Countdown 5 and 3SAT-9, I'm wondering if 600 epochs is sufficient for convergence, especially since these require extensive planning and multi-step reasoning.
From my own experiments/reproduction attempts, it seems like Countdown 5 in particular underperforms or doesn't fully converge after 600 epochs (e.g., accuracy plateaus below expected levels). Could you confirm:
- Is 600 epochs empirically sufficient for these harder tasks in the baselines, or did you observe longer training being beneficial?
- If more epochs are needed, what would be a recommended extension (e.g., 1000+ epochs) without risking overfitting?
- Any other hyperparameters (e.g., scheduler adjustments) that helped stabilize training for Countdown 5 or 3SAT-9?
This would be super helpful for reproducing your results accurately. Happy to share more details on my setup if needed!
Best regards,
Xiangxiang
MSRA & CUHK