Stochastic Muzero performance was not as expected. #309

walkacross · 2024-12-20T04:24:25Z

hi @puyuan1996 , sorry for the late response,as the training time of Stochastic Muzero in game 2048 seems excessively long.
I’d like to discuss some experimental results and questions with you.

1 Stochastic Muzero performance was not as expected.

The model reached an episode reward mean of around 50,000 at 2 million environment steps, but oscillated between 2 million and 14 million steps without significant improvement. both the collect stage and evaluate stage.

2 question about expected performance.

the performance of stochastic muzero in raw paper is as follows

it seems the model reaches an episode reward mean of around 250k at 1 billion environment step. Could you share your experimental results with me?

3 question about the tranining time.

based on the raw config in game-2048-stochstic-muzero in lightzero

env_id = 'game_2048'
action_space_size = 4
use_ture_chance_label_in_chance_encoder = True
collector_env_num = 8
n_episode = 8
evaluator_env_num = 3
num_simulations = 100
update_per_collect = 200
batch_size = 512
max_env_step = int(1e9)
reanalyze_ratio = 0.
num_of_possible_chance_tile = 2
chance_space_size = 16 * num_of_possible_chance_tile

the model took 5 days to reach 14 million environment steps. I’d like to ask:

3.1 What is the approximate training duration for your models?
3.2 How long would it take to train for 10 billion environment steps, as stated in the paper?
3.3 Are there any alternative approaches to further reduce the training time?

4 bug, the game-2048 can not render properly on the screen when set the mode="image_realtime_mode".

when set mode of the game-2048 to image_realtime_mode, there is no any response on the screen, you can try it.

import numpy as np
import pytest
from easydict import EasyDict

from game_2048_env import Game2048Env

cfg = Game2048Env.default_config()
print(cfg)
cfg.render_mode = "image_realtime_mode"
print(cfg)


env = Game2048Env(cfg=cfg)


obs = env.reset()
print(obs)
#action = np.random.choice([0,1,2,3])
#print(action)

#obs, reward, done, info = env.step(action)
#print()

for i in range(10000):
    #env.render(mode="image_realtime_mode")
    action = np.random.choice([0,1,2,3])
    obs, reward, done, info = env.step(action)

The text was updated successfully, but these errors were encountered:

puyuan1996 · 2024-12-23T08:03:57Z

Question 1 and Question 2

As you mentioned, our previous experiments were also limited to around 2M environment steps, and we did not conduct longer training sessions. Based on your preliminary experimental results, they align with ours. Regarding the lack of further improvement in later stages, we suspect it may be related to the 2048 environment settings. Currently, the code sets a maximum tile_num (see specific code: game_2048_env.py#L116), which might restrict the highest score achievable in a single game. Additionally, the existing configuration file (stochastic_muzero_2048_config.py) is still in its initial version and has not been extensively optimized for performance.

To address this issue, we suggest the following improvements:

Enhance exploration mechanisms.
Optimize hyperparameter tuning.
Reward normalization: Introduce techniques like value rescale or symlog to normalize reward values, reduce the dynamic range of rewards, and improve training stability.

Implementing these methods could significantly improve performance in later stages.

Question 3

Regarding multi-GPU acceleration and environment optimization, we recommend focusing on the following two aspects:

Multi-GPU distributed training:
- Refer to the multi-GPU DDP configuration file for the Atari environment (atari_muzero_multigpu_ddp_config.py) and adapt the 2048 environment to the multi-GPU training framework.
- In theory, distributed training with multiple GPUs should achieve nearly linear speedup.
Environment and configuration optimization:
- Environment optimization: Carefully analyze the game_2048_env.py code logic to eliminate unnecessary computations (e.g., potential redundancies in state encoding or rendering) and improve interaction efficiency.
- Configuration optimization: Adjust parameters in stochastic_muzero_2048_config.py, such as reducing num_simulations or fine-tuning batch_size, to lower computational overhead while maintaining performance.

By optimizing these two aspects, you can improve both training efficiency and overall performance.

Question 4

For Question 4, we found that if you uncomment this line, your script will execute correctly, and you’ll be able to see the game being rendered in real time. We’ll fix this bug in a future update.

We plan to start working on efficiency and performance optimizations in the coming weeks. If you’re interested, you can explore these optimizations locally in advance and submit any improvements or questions via a PR or issue. We deeply appreciate your contributions and look forward to seeing your optimization results!

Once again, thank you for supporting the LightZero project!

walkacross · 2024-12-24T04:08:24Z

hi, thanks for your detail reply. If there is any progress, I will update you accordingly.

puyuan1996 added bug Something isn't working efficiency optimization Efficiency optimization (time, memory and so on) labels Dec 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stochastic Muzero performance was not as expected. #309

Stochastic Muzero performance was not as expected. #309

walkacross commented Dec 20, 2024 •

edited

Loading

puyuan1996 commented Dec 23, 2024

walkacross commented Dec 24, 2024

Stochastic Muzero performance was not as expected. #309

Stochastic Muzero performance was not as expected. #309

Comments

walkacross commented Dec 20, 2024 • edited Loading

1 Stochastic Muzero performance was not as expected.

2 question about expected performance.

3 question about the tranining time.

4 bug, the game-2048 can not render properly on the screen when set the mode="image_realtime_mode".

puyuan1996 commented Dec 23, 2024

Question 1 and Question 2

Question 3

Question 4

walkacross commented Dec 24, 2024

walkacross commented Dec 20, 2024 •

edited

Loading