You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How can I initialize a MiniGrid environment with a custom max_steps value?
My issue is that I believe that the default max_steps value for the Multi-room family of environments is maybe a bit low (120 for the 6 room environment).
I am running some experiments using a slightly modified recurrent Q-learning approach (similar to the r2d2 paper). I have been able to solve harder (I assume) environments such as the ObstructedMaze-2Dlhb and KeyCorridorS4R3 environments with my approach but my agent is unable to learn anything in MultiRoom-N6 simply because episodes end very quickly and there is no episode with non-zero reward in the replay buffer.
Any help is appreciated.
I am assessing the effectiveness of a model-augmented recurrent Q-learning approach versus a vanilla recurrent Q-learning approach (r2d2) and I testing my approach on all the environments in the MiniGrid family. (So far I've seen some big improvements especially in the ObstructedMaze and KeyCorridor family).
The text was updated successfully, but these errors were encountered:
esalehi1996
changed the title
[Question] Question title
[Question] Custom max_steps
Oct 15, 2022
Hi @esalehi1996 thank you for bringing this up. We are working on a PR to add this as a feature, #265 . You'll be able to initialize any minigrid environment with the max_steps argument to set the total number of steps per episode.
How can I initialize a MiniGrid environment with a custom max_steps value?
My issue is that I believe that the default max_steps value for the Multi-room family of environments is maybe a bit low (120 for the 6 room environment).
I am running some experiments using a slightly modified recurrent Q-learning approach (similar to the r2d2 paper). I have been able to solve harder (I assume) environments such as the ObstructedMaze-2Dlhb and KeyCorridorS4R3 environments with my approach but my agent is unable to learn anything in MultiRoom-N6 simply because episodes end very quickly and there is no episode with non-zero reward in the replay buffer.
Any help is appreciated.
I am assessing the effectiveness of a model-augmented recurrent Q-learning approach versus a vanilla recurrent Q-learning approach (r2d2) and I testing my approach on all the environments in the MiniGrid family. (So far I've seen some big improvements especially in the ObstructedMaze and KeyCorridor family).
The text was updated successfully, but these errors were encountered: