Skip to content

Commit

Permalink
Fix docs and typos
Browse files Browse the repository at this point in the history
  • Loading branch information
kennyderek committed Jul 16, 2021
1 parent a355fbe commit 5edf23c
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 14 deletions.
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,18 @@
We want this project to be accessible to everyone. We use CPUs to train our models, using the framework called RLLib. In this paper, we used 3 cores and experiments ran under 24 hours.

## Set-up (<5 min)
In a virtual environment (we recommend using conda or miniconda e.g. by ```conda create -n adapvenv python=3.8```), install the python module containing Farmworld, Markov Soccer, and the Multi-Goal experiment. This module is called ```adapenvs``` and can be installed by:
In your favorite project directory
```
git clone git@github.com:kennyderek/adap.git
cd adap
```

Then, in a virtual environment (we recommend using conda or miniconda e.g. by ```conda create -n adapvenv python=3.8```), install the python module containing environment code (for Farmworld, Gym wrappers, etc.). This module is called ```adapenvs``` and can be installed by:
```
cd adaptation_envs
pip install -e .
```
Now, we can install the python module containing the ADAP policy code (written for RLLib) and contained in the module ```adap```. This will also install dependencies such as pytorch, and ray[rllib].
Now, we can install the python module containing the ADAP policy code (written for RLLib) and contained in the module ```adap```. This will also install dependencies such as pytorch, tensorflow, and ray[rllib].
```
cd ..
cd adap_policies
Expand All @@ -25,7 +31,7 @@ python run.py --conf ../configs/cartpole/train/adap.yaml --exp-name cartpole
```
```cartpole/adap.yaml``` is just one possible configuration file, with information regarding the 1) training environment and 2) algorithm hyperparameters. Feel free to make new configuration files by modifying hyperparameters as you wish! Automatically, RLLib will start training and checkpointing the experiment in the directory ```~ray_results/cartpole/[CONFIG_FILE + TIME]```. By default, this will checkpoint the code every 100 epochs, and at the end of training.

### Visualizing Traing Results
### Visualizing Training Results

Make sure you are using your virtual env, and that it has the installed ADAP python modules. Visualization should cause a PyGlet window to pop up, and render CartPole.

Expand All @@ -44,11 +50,11 @@ What if we want to search for ADAP policies (via latent distribution optimizatio
```
python run.py --conf ../configs/cartpole/train/adap.yaml --restore ~/ray_results/cartpole/[CONFIG_FILE + TIME]/checkpoint_000025/checkpoint-25 --evaluate ../configs/cartpole/ablations/move_right.yaml --evolve
```
The ```--evaluate``` argument specifies a new environment configuration to use, which replaces the training environment configuration. Here, we have provided ```move_right.yaml```, which modifies the reward function to be r(t) = -x-axis position of the cartpole. The ```--evolve``` flag tells ```run.py``` to
The ```--evaluate``` argument specifies a new environment configuration to use, which replaces the training environment configuration. Here, we have provided ```move_right.yaml```, which modifies the reward function to be r(t) = -x-axis position of the cartpole. The ```--evolve``` flag tells ```run.py``` to run latent optimization on the new environment dynamics.

For CartPole, we optimize the latent space for 30 steps, which is enough to recover policies from our policy space that can move left, or right, consistently.

Awesome work! You've completed training and latent optimization of a policy space for CartPole!
Awesome work! You've completed training and latent optimization of a policy space for CartPole! If you'd like, try out getting the CartPole to move on the left side of the screen, with ```move_left.yaml```.

## FAQs

Expand Down
1 change: 1 addition & 0 deletions adap_policies/adap.egg-info/requires.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ ray[rllib]
adapenvs
torch
tensorflow
pyglet
2 changes: 1 addition & 1 deletion adap_policies/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@

setup(name='adap',
version='0.0.2',
install_requires=['gym', 'ray', 'ray[rllib]', 'adapenvs', 'torch', 'tensorflow'] #And any other dependencies required
install_requires=['gym', 'ray', 'ray[rllib]', 'adapenvs', 'torch', 'tensorflow', 'pyglet'] #And any other dependencies required
)
17 changes: 9 additions & 8 deletions scripts/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@
import argparse
import yaml

from ray import tune

from common import get_env_and_callbacks, get_name_creator, get_trainer, build_trainer_config, get_name_creator

import copy
Expand All @@ -13,12 +11,12 @@
parser.add_argument('--exp-name', type=str, default="context_exp")
parser.add_argument('--local-dir', type=str, default="~/ray_results")

parser.add_argument('--restore', type=str, default="") # path to restore the game
parser.add_argument('--evaluate', type=str, default="") # path to restore the game
parser.add_argument('--evolve', action="store_true") # path to restore the game
parser.add_argument("--conf", type=str, help="path to the config file containing ADAP hyperparameters and environment settings")

parser.add_argument("--train", action="store_true")
parser.add_argument("--conf", type=str)
parser.add_argument('--restore', type=str, default="", help="")
parser.add_argument('--evaluate', type=str, default="", help="path of the config file on which to evaluate a model")
parser.add_argument('--evolve', action="store_true", help="whether to perform latent optimization")
parser.add_argument("--train", action="store_true", help="used to continue training a restored model")


if __name__ == "__main__":
Expand All @@ -45,10 +43,11 @@
stop = {
"timesteps_total": training_conf['timesteps_total'],
"training_iteration": training_conf['training_iteration'],
# "episode_reward_mean": 34 # this would mean 35/40 agents have survived on average, and is probably a good stop condition
}

if args.restore == "":
from ray import tune

tune.run(trainer_cls,
config=trainer_conf,
stop=stop,
Expand All @@ -59,6 +58,8 @@
trial_dirname_creator=get_name_creator(path), # the name after ~/ray_results/context_exp
)
elif args.train:
from ray import tune

# pick up where we left off training, using a checkpoint
tune.run(trainer_cls,
config=trainer_conf,
Expand Down

0 comments on commit 5edf23c

Please sign in to comment.