Skip to content

A modular implementation for Proximal Policy Optimization in Tensorflow 2 using Eagerly Execution for the Super Mario Bros enviroment.

Notifications You must be signed in to change notification settings


Repository files navigation


A modular implementation for Proximal Policy Optimization in Tensorflow 2 using Eagerly Execution for the Super Mario Bros enviroment.

alt text


  • Tensorflow 2
  • OpenCV
  • OpenAI gym
  • Super Mario Bros NES, developed by Kautenja


Clone the repository,

Change the path to the cloned repository

import os

For training, run:

python -c 'from Main import train; train(True)'

The argument of training enables the load of weights of the trained model.

For testing the model:

python -c 'from Main import test; test(10,0)'

Where the first argument of test is the number of episodes to test the model, and the second is the number of the enviroment to test.

For the code the enviroments available are the next ones:

0 : SuperMarioBros-1-1-v0
The first level of the first world
1 : SuperMarioBros-1-2-v0 
The second level of the first world
2 : SuperMarioBros-1-3-v0
The third level of the first world
3 : SuperMarioBros-2-2-v0
The second level of the second world

To change the enviroments, modify the file.

Eight actors were trained in the first level of Mario, and this is how it learned to finish it.

alt text

A plot how the average reward evolved vs the time steps, the model trained in four steps due connection, the reward isn't the same as the raw output of Kautenja's implementation, it was previously scaled for this model, all the data pre processing is in the file.

alt textalt textalt textalt text

In the logs directory you can find two more plots, for average X_position and Max_X_position.

Testing in not observed enviroments:

alt text alt text alt text

About the files of the repository:

  • The file contains the train and test functions for the model.

    1. The train function saves the weights of the model every 1000 timesteps, also creates summary files to visualize the change of the average total reward, the average of the x position and the max value of x position.

    2. The test function loads the weights of the model and test in the selected levels with deterministic actions, the train do stochastic actions to avoid reaching a local optimal; and creates in MP4 videos of how the agent did as many of defined numbers of test was selected.

  • The file contains all the parameters needed for tune the algorithm, it transfer the parameters across the other files, also calls the file to create the enviroment.

  • The file defines the enviroment of four diferent levels of Super Mario Bros and calls the preprocessing functions.

  • The file creates several Classes to do:

    1. Reset the enviroment after dying, this gives an additional negative reward of 50.

    2. Reset the enviroment after getting the flag or completing the level, this adds a positive reward of 100.

    3. Scalation of the reward, by a 0.05 factor.

    4. Resize the image and grayscaling for a faster performance of the neural network.

    5. Stochastic skipping of frames, based on [2], to add a randomness to the enviroment.

    6. Stacking of frames to create a sense of movement, based on the Atari DeepMind's implementation.

    7. Scaling the pixels of the image with 255 to get a range of [0-1] values.

  • The file contains some common function to use in the program, like saving, loading models.

  • The file create a callable with multiple Proccess to create several actors, and also calcules the advantage estimator defined in [1].

  • The file contains tf functions to calculate the total loss defined in [1] and run gradients in eagerly execution of tensorflow 2.

  • The file contains two classes of models, for the actor and the critic.

This code was inspired from:

What to do now?

  • Implement meta learning and train in multiple enviroments for a more generalized actor.


A modular implementation for Proximal Policy Optimization in Tensorflow 2 using Eagerly Execution for the Super Mario Bros enviroment.






No releases published


No packages published