Mental Gymnastics

This is an OpenAI compatible reinforcement learning environment to deliver the final project for Deep Learning, Spring 2021.

This is an environment where you can specify a number of atomic functions (with code) that you wish an agent to consider placing into an experiment.

The agent, beginning with these atomic functions and an input dataset, will begin experimenting and will add higher level functions into it's function space as it explores and evolves to solve a defined problem.

Currently these functions are PyTorch layers.

The State Space

The environment presents a complex state space composed of:

The Experiment Space,
The Function Space,
Metrics associated with the Function Space.

The Experiment Space

This is a dataset of experiment nodes. This is an episode length long iterable where each element contains:

A function id for the functions which have been added to the Experiment Space,
A location at which the function node exists at in this Experiment Space.

When the episode starts this consists solely of the source and sink nodes.

The Function Space

This is a dataset that represents a function palette from which the agent may choose Functions to insert into the Experiment Space, which contains:

Atomic functions, and
Composed functions

The size of the palette is defined at runtime when the gym is created.

The Action Space

$A == \left(ID, L, R)\right$

The agent picks an action every turn by selecting an index, a location, and a radius. The agent constructed within this repository is implemen

In every time step the state is updated by inserting the function that was selected at the location and radius given.

Atomic functions placed will take all non-sink nodes within the radius and use those as input to create a new node at location.

Composed functions placed will recreate the composed function exactly.

The Reward Function

There are a few reward functions available in the repository:

A small monotonic reward proportional to proximity to the sink.
A slightly large constant value of C which is rewarded if the agent has connected a node.
A modest sized constant value of N which is rewarded if the agent has connected from input to sink.

Running Episodes, or an Experiment

In a single Experiment, an agent may run many trials. In a trial, or an episode, an agent will place n nodes before terminating the trial. Limiting this to small numbers is likely to produce more generalizable results.

When an agent places a node it changes the in place net by adding a new node at that location and aggregating all the 'in radius nodes' in the manner specified by the action. If inputs can be simplified easily, they are (negotiable, this would add a layer of complexity, but would likely reduce computational burden.)

At the end of an episode the net structure that the agent created is saved as a single new action into the Action Space.

Curating the Action Space

By default this stores all actions, though actions which have gone stale (i.e. haven't been in the leaderboard for some time and no layer in the leader board relies on them) will be pruned, and all their descendants will be removed as well.

This will help to limit the breadth of available actions.

Paper Link

Name		Name	Last commit message	Last commit date
Latest commit History 250 Commits
.github/workflows		.github/workflows
experiment_configs		experiment_configs
experiment_eight		experiment_eight
experiment_eleven		experiment_eleven
experiment_five		experiment_five
experiment_four		experiment_four
experiment_nine		experiment_nine
experiment_one		experiment_one
experiment_seven		experiment_seven
experiment_six		experiment_six
experiment_ten		experiment_ten
experiment_three		experiment_three
experiment_twelve		experiment_twelve
experiment_two		experiment_two
model_eight		model_eight
model_eleven		model_eleven
model_five		model_five
model_four		model_four
model_nine		model_nine
model_one		model_one
model_seven		model_seven
model_six		model_six
model_ten		model_ten
model_three		model_three
model_twelve		model_twelve
model_two		model_two
results		results
results_eight/A2C_1		results_eight/A2C_1
results_eleven/A2C_1		results_eleven/A2C_1
results_five/A2C_1		results_five/A2C_1
results_four/A2C_1		results_four/A2C_1
results_nine/A2C_1		results_nine/A2C_1
results_seven/A2C_1		results_seven/A2C_1
results_six/A2C_1		results_six/A2C_1
results_ten/A2C_1		results_ten/A2C_1
results_three/A2C_1		results_three/A2C_1
results_twelve/A2C_1		results_twelve/A2C_1
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.gin		config.gin
config.gin.bkp		config.gin.bkp
docker_run.sh		docker_run.sh
dockerfile		dockerfile
dockerfile_w_poetry.bkp		dockerfile_w_poetry.bkp
main.py		main.py
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mental Gymnastics

The State Space

The Experiment Space

The Function Space

The Action Space

The Reward Function

Running Episodes, or an Experiment

Curating the Action Space

About

Releases

Packages

Contributors 4

Languages

License

LunarEngineer/MentalGymnastics

Folders and files

Latest commit

History

Repository files navigation

Mental Gymnastics

The State Space

The Experiment Space

The Function Space

The Action Space

The Reward Function

Running Episodes, or an Experiment

Curating the Action Space

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages