-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stochastic Muzero performance was not as expected. #309
Comments
Question 1 and Question 2As you mentioned, our previous experiments were also limited to around 2M environment steps, and we did not conduct longer training sessions. Based on your preliminary experimental results, they align with ours. Regarding the lack of further improvement in later stages, we suspect it may be related to the 2048 environment settings. Currently, the code sets a maximum To address this issue, we suggest the following improvements:
Implementing these methods could significantly improve performance in later stages. Question 3Regarding multi-GPU acceleration and environment optimization, we recommend focusing on the following two aspects:
By optimizing these two aspects, you can improve both training efficiency and overall performance. Question 4For Question 4, we found that if you uncomment this line, your script will execute correctly, and you’ll be able to see the game being rendered in real time. We’ll fix this bug in a future update. We plan to start working on efficiency and performance optimizations in the coming weeks. If you’re interested, you can explore these optimizations locally in advance and submit any improvements or questions via a PR or issue. We deeply appreciate your contributions and look forward to seeing your optimization results! Once again, thank you for supporting the LightZero project! |
hi, thanks for your detail reply. If there is any progress, I will update you accordingly. |
hi @puyuan1996 , sorry for the late response,as the training time of Stochastic Muzero in game 2048 seems excessively long.
I’d like to discuss some experimental results and questions with you.
1 Stochastic Muzero performance was not as expected.
The model reached an episode reward mean of around 50,000 at 2 million environment steps, but oscillated between 2 million and 14 million steps without significant improvement. both the collect stage and evaluate stage.
2 question about expected performance.
the performance of stochastic muzero in raw paper is as follows
it seems the model reaches an episode reward mean of around 250k at 1 billion environment step. Could you share your experimental results with me?
3 question about the tranining time.
based on the raw config in game-2048-stochstic-muzero in lightzero
the model took 5 days to reach 14 million environment steps. I’d like to ask:
3.1 What is the approximate training duration for your models?
3.2 How long would it take to train for 10 billion environment steps, as stated in the paper?
3.3 Are there any alternative approaches to further reduce the training time?
4 bug, the game-2048 can not render properly on the screen when set the mode="image_realtime_mode".
when set mode of the game-2048 to image_realtime_mode, there is no any response on the screen, you can try it.
The text was updated successfully, but these errors were encountered: