This is a RL agent based on Doubel Deep Q Network algorithms built to play super mario bros.
The program uses openai gym and nes python emulator to play the game.
To make the AI learn more effeciently we preprocess the data given to it. We reduce the number of possible action space available to the AI to basic 7 simple movement controls instead of all possible key combinations. We convert our RGB frames to greyscale and also normalise the pixel values from 0 to 1 to improve the learning performance of the AI. As every consecutive frame may not be needed for effecient learning our algorithm only returns every 4th frame.
The AI uses Double Deep Q network. It uses the Bellman equations and Q-update rules to perform remember, recall and experience_replay functions. We also maintain two tables dq1 and dq2 for Double Q learning.
The OpenAi gym offers several versions of Mario for us to use. Here we compare the learning rate for v0 and v3. Super_mario_bros_v0 is the standard mario whereas the v3 is the rectangular pixelated version.
We see that the reward amount increases with increasing number of episodes passed and also that the v3 version of mario got to higher rewards in lesser number of episodes as compared to mario version v0.
(More number of episodes train would also give better insight comparing the two versions but it was proving difficult to train the model for longer time durations due to the large time it took and longer runtime limitations in Google Colab)