Skip to content

Conversation

qyccc
Copy link

@qyccc qyccc commented Apr 10, 2018

What do these changes do?

Add the algorithm ddpg in rllib.
It can run using the same command as 'python ray/python/ray/rllib/train.py --run DDPG --env MountainCarContinuous-v0'.

Related issue number

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4779/
Test FAILed.

@richardliaw richardliaw changed the title add ddpg in rllib [rllib] Adding DDPG Apr 10, 2018
@richardliaw
Copy link
Contributor

Thanks for contributing this! Do you have any performance numbers/charts?

Copy link
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for contributing this! Do you mind running flake8 on rllib/ddpg/ and fixing the errors that show up?

def _register_all():
for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake",
"__sigmoid_fake_data", "__parameter_tuning"]:
"__sigmoid_fake_data", "__parameter_tuning", "DDPG"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mind moving this before the private keys?


def _build_q_network(self, registry, inputs, state_space, ac_space, act_t, config):
x = inputs
x = tf.layers.dense(x, 64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use tf.slim instead? it would be great to avoid mixing higher level libraries

import os

import numpy as np
import tensorflow as tf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move all of the TF code out of this file? We would like to keep the main algorithm framework agnostic

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this! I think this is good to merge after we add it to the test files mentioned below, and check that it passes those.

@@ -0,0 +1,56 @@
# --------------------------------------
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we move this into rllib/utils ?

from ray.rllib.ddpg.ou_noise import AdaptiveParamNoiseSpec


def _huber_loss(x, delta=1.0):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this seems duplicated with DQN, we should probably move it to a util file.


"""The base DDPG Evaluator that does not include the replay buffer.
TODO(rliaw): Support observation/reward filters?"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems out of date and could be removed.


return result

def _populate_replay_buffer(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused function.

return _ParameterTuningAgent
elif alg == "DDPG":
from ray.rllib import ddpg
return ddpg.DDPGAgent
Copy link
Contributor

@ericl ericl Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Along with registering it here, could you also register DDPG in these automated tests?

  1. The generic test for supported obs/action spaces:
    https://github.com/ray-project/ray/blob/master/python/ray/rllib/test/test_supported_spaces.py#L132

  2. The regression tests folder: https://github.com/ray-project/ray/tree/master/python/ray/rllib/tuned_examples/regression_tests

(probably MountainCarContinuous-v0 is the right env for this test)

  1. The checkpoint/restore test https://github.com/ray-project/ray/blob/master/python/ray/rllib/test/test_checkpoint_restore.py

You should be able to basically copy paste an existing entry for e.g., DQN in each of these cases. The tests can be run locally (and will also be run by travis/jenkins once pushed here).

# Whether to compute priorities on workers.
worker_side_prioritization=False,
# Whether to force evaluator actors to be placed on remote machines.
force_evaluators_remote=False)
Copy link
Contributor

@ericl ericl Apr 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we drop this config? It used to be used for Ape-X but is no longer needed due to improvements in Ray's actor scheduling.

@qyccc
Copy link
Author

qyccc commented Apr 11, 2018

screenshot-2018-4-11 tensorboard
MountaincarContinous DDPG

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4806/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4807/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks close. I just ran ./train.py -f tuned_examples/regression_tests/mountaincarcontinuous-ddpg.yaml and got

Remote function __init__ failed with:

Traceback (most recent call last):
  File "/home/eric/Desktop/ray-private/python/ray/worker.py", line 832, in _process_task
    *arguments)
  File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 212, in actor_method_executor
    method_returns = method(actor, *args)
  File "/home/eric/Desktop/ray-private/python/ray/rllib/agent.py", line 84, in __init__
    Trainable.__init__(self, config, registry, logger_creator)
  File "/home/eric/Desktop/ray-private/python/ray/tune/trainable.py", line 89, in __init__
    self._setup()
  File "/home/eric/Desktop/ray-private/python/ray/rllib/agent.py", line 107, in _setup
    self._init()
  File "/home/eric/Desktop/ray-private/python/ray/rllib/ddpg/ddpg.py", line 105, in _init
    for i in range(self.config["num_workers"])]
  File "/home/eric/Desktop/ray-private/python/ray/rllib/ddpg/ddpg.py", line 105, in <listcomp>
    for i in range(self.config["num_workers"])]
  File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 822, in remote
    dependency=actor_cursor)
  File "/home/eric/Desktop/ray-private/python/ray/actor.py", line 646, in _actor_method_call
    args = signature.extend_args(function_signature, args, kwargs)
  File "/home/eric/Desktop/ray-private/python/ray/signature.py", line 212, in extend_args
    .format(function_name))
Exception: Too many arguments were passed to the function '__init__'

I think this is since the evaluator class doesn't take a logdir argument? Could you make sure that that example runs to -15 result?

# Number of env steps to optimize for before returning
timesteps_per_iteration=1000,

exploration_noise=0.2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment for exploration_noise and action_noise?

# Smooth the current average reward over this many previous episodes.
smoothing_num_episodes=100,


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one newline


return result

# def _populate_replay_buffer(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove?

import numpy as np


class OUNoise:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot to remove this file after moving it to rllib/utils


def _register_all():
for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake",
for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "DDPG", "__fake",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we rename this to "DDPG_baselines", and change the directory to rllib/ddpg_baselines in order to not conflict with the other DDPG pr here https://github.com/ray-project/ray/pull/1877/files ?

The main feature of this PR is that it's adapted from the baselines code, so we should probably keep that in the naming (or, could also call it DDPG2 and just leave the details in the README).

@ericl ericl changed the title [rllib] Adding DDPG [rllib] Adding DDPG (adapted from baselines) Apr 11, 2018
@ericl ericl self-assigned this Apr 11, 2018
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4817/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4818/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/4819/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bunch of lint errors here: https://api.travis-ci.org/v3/job/365429239/log.txt

Could you also make sure the HalfCheetah environment trains, and add a tuned example for this env? This one should be fairly fast and so is a good sanity check for DDPG.


for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "DDPG",
"__fake", "__sigmoid_fake_data", "__parameter_tuning"]:
"DDPG_beselines", "__fake", "__sigmoid_fake_data", "__parameter_tuning"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: baselines

self.set_weights(objs["weights"])

def sync_filters(self, new_filters):
"""Changes self's filter to given and rebases any accumulated delta.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation here is off

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5286/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5289/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5292/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5293/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5307/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5329/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5347/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5356/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5357/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5366/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5376/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5395/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/5411/
Test PASSed.

@qyccc
Copy link
Author

qyccc commented May 17, 2018

why does continuous-integration/travis-ci/pr always fail ? I don‘t konw if my code is wrong

@robertnishihara
Copy link
Collaborator

@qyccc unfortunately we have a bunch of flaky tests right now. Really need to address this.. Keeping track of them at https://github.com/ray-project/ray/issues?q=is%3Aissue+is%3Aopen+label%3A%22test+failure%22.

@ericl
Copy link
Contributor

ericl commented Jul 19, 2018

Hm now that the other two DDPG's have been consolidated (as they achieve the exact same performance), it probably makes sense for this one to as well.

@ericl ericl closed this Jul 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants