Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MiniGrid and/or Maze environment support for AlphaZero #312

Open
YuriyPryyma opened this issue Dec 25, 2024 · 1 comment
Open

MiniGrid and/or Maze environment support for AlphaZero #312

YuriyPryyma opened this issue Dec 25, 2024 · 1 comment

Comments

@YuriyPryyma
Copy link

Thank you for open-sourcing such a fantastic library!

I am trying to train my own environment using the LightZero library.
My environment has the following characteristics:

  • Discrete action space
  • Single-player
  • Deterministic dynamics (with a simulator available)
  • Sparse rewards
  • Tensor-shaped state space

I believe the environment most similar to mine would be something like MiniGrid or Maze. In LightZero, only MuZero-like algorithms support these envs. However, these algorithms involve learning the dynamics, which is not ideal for me since I already have a deterministic environment simulator. Not learning environment dynamics would be more sample-efficient, so I want to use AlphaZero approach.

I noticed LightZero paper states that support for MiniGrid or Maze envs with AlphaZero is under development.

My questions:

  1. Is there an intermediate, unmerged implementation of MiniGrid or Maze environments with AlphaZero that I could use as an example?
  2. How can I generally utilize LightZero's AlphaZero implementation for single-player games?
@YuriyPryyma
Copy link
Author

YuriyPryyma commented Dec 25, 2024

I have noticed some recent MR #245 that adds "single_player_mode"

But as it looks like AlphaZero is still very much focused on board games as policy code operates with "board" variable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant