-
Notifications
You must be signed in to change notification settings - Fork 6.5k
[rllib] Contribute DDPG to RLlib #1877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
contribute DDPG and related test configurations to Ray RLlib
Test PASSed. |
cc @vlad17 |
python/ray/rllib/__init__.py
Outdated
@@ -8,7 +8,7 @@ | |||
|
|||
|
|||
def _register_all(): | |||
for key in ["PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake", | |||
for key in ["DDPG", "APEX_DDPG", "PPO", "ES", "DQN", "APEX", "A3C", "BC", "PG", "__fake", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this DDPG2
and APEX_DDPG2
for now, since there is a conflicting PR that was merged earlier. We will resolve the differences later to combine the implementations.
The package directory should also be moved to rllib/ddpg2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
python/ray/rllib/ddpg/__init__.py
Outdated
from __future__ import print_function | ||
|
||
from ray.rllib.ddpg.apex import ApexDDPGAgent | ||
from ray.rllib.ddpg.ddpg import DDPGAgent, DEFAULT_CONFIG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ApexDDPG2Agent, DDPG2Agent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,108 @@ | |||
"""This file is used for specifying various schedules that evolve over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is literally a copy of dqn/schedules.py
. Let's move that to common to avoid this code duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid making any change to dqn, I just use the schedulers of dqn currently.
# Override atari default to use the deepmind wrappers. | ||
# TODO(ekl) this logic should be pushed to the catalog. | ||
if is_atari and "custom_preprocessor" not in options: | ||
return wrap_deepmind(env, random_starts=random_starts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since DDPG isn't typically used with discrete action spaces (e.g, atari), how about we remove this wrapper and just use ModelCatalog.get_preprocessor...
?
This means we can remove this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried your solution. However, without an complicated assertion on observation type, say Box(0.0, 1.0, (80, 80, 1), dtype=np.float32) is allowed while Box(0.0, 1.0, (210, 160, 3), dtype=np.float32) is unsupported, DDPG can NOT pass the test/test_supported_spaces.py test. Thus, I removed this file according to your comment but directly import and use it from dqn instead of your proposal. Any concern, please let me know and I will revise according to your comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that sounds fine then, we can clean up Atari handling later.
python/ray/rllib/ddpg/ddpg.py
Outdated
smoothing_num_episodes=100, | ||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two newlines only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, got your conventions.
print(str(result)) | ||
pretty_print(result) | ||
|
||
if __name__=="__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it intended for this test to be run as part of the automated tests?
If so, consider adding it as an entry in run_multi_node_tests.sh
, otherwise removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for making it messy, I removed these files.
print(mean_100ep_reward) | ||
""" | ||
|
||
if __name__=="__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be an automated test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
ob = new_ob | ||
|
||
if __name__=="__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be an automated test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
self.saved_mean_reward = data[3] | ||
self.obs = data[4] | ||
self.global_timestep = data[5] | ||
self.local_timestep = data[6] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned this file has too much in common with dqn_evaluator.py
. However, it's not clear if coupling DQN and DDPG would be a good idea either. @richardliaw any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Both of them are in a Q-learning behavior. The differences can be expressed by DQNGraph and DDPGGraph. We can consider distilling a parent class like QAgent/QEvaluator later.
@@ -0,0 +1,15 @@ | |||
pendulum-ddpg: | |||
env: Pendulum-v0 | |||
run: DDPG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DDPG2 here and elsewhere in the YAML examples.
Btw, how long does this usually take to complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. It takes around 650 to 750 seconds.
Two other requests for tests:
|
Test PASSed. |
@@ -0,0 +1,108 @@ | |||
"""This file is used for specifying various schedules that evolve over |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file can be removed.
@@ -0,0 +1,408 @@ | |||
from __future__ import absolute_import | |||
from __future__ import division |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vladf can you take a look over the models here?
python/ray/rllib/ddpg2/models.py
Outdated
s_func_vars = _scope_vars(scope.name) | ||
|
||
# Actor: P (policy) network | ||
p_scope_name = TOWER_SCOPE_NAME + "/p_func" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop TOWER SCOPE name, that was only used for the multi GPU optimizer, which is being refactored to not need it
Hi teachers, |
Test PASSed. |
Looks like there is a conflicting file. I think we can merge this once that's fixed, but let's make sure to add results for HalfCheetah experiments afterwards. |
Test PASSed. |
@joneswong the lint tests are failing in travis. Can you run |
@@ -22,15 +22,20 @@ def get_mean_action(alg, obs): | |||
CONFIGS = { | |||
"ES": {"episodes_per_batch": 10, "timesteps_per_batch": 100}, | |||
"DQN": {}, | |||
"DDPG2": {"noise_scale": 0.0}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add @alvkao58's DDPG to this file too (in a separate PR that is)
@@ -114,6 +114,7 @@ class ModelSupportedSpaces(unittest.TestCase): | |||
def testAll(self): | |||
ray.init() | |||
stats = {} | |||
check_support("DDPG2", {"timesteps_per_iteration": 1}, stats) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about @alvkao58 (not in this pr)
Test PASSed. |
@ericl formatted the .py files in ddpg2 folder by yapf by executing the command you provided. |
Hm, there seem to be some lint errors still: https://api.travis-ci.org/v3/job/367974621/log.txt (click the travis details -> go to the LINT job)
|
Test FAILed. |
Test PASSed. |
Merged, thanks! |
Nice work! |
Thanks for eric's patience. I learned some conventions of open source project from this pr. I will adhere to the code style in the following prs. |
Thanks for contributing this!
…On Thu, Apr 19, 2018 at 10:44 PM Jones Wong ***@***.***> wrote:
Thanks for eric's patience. I learned some conventions of open source
project from this pr. I will adhere to the code style in the following prs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1877 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AEUc5etzdQ3yLH28yLtcOCjhihDBDHeSks5tqXXWgaJpZM4TQX9z>
.
|
* master: Handle interrupts correctly for ASIO synchronous reads and writes. (ray-project#1929) [DataFrame] Adding read methods and tests (ray-project#1712) Allow task_table_update to fail when tasks are finished. (ray-project#1927) [rllib] Contribute DDPG to RLlib (ray-project#1877) [xray] Workers blocked in a `ray.get` release their resources (ray-project#1920) Raylet task dispatch and throttling worker startup (ray-project#1912) [DataFrame] Eval fix (ray-project#1903) [tune] Polishing docs (ray-project#1846) [tune] [rllib] Automatically determine RLlib resources and add queueing mechanism for autoscaling (ray-project#1848) Preemptively push local arguments for actor tasks (ray-project#1901) [tune] Allow fetching pinned objects from trainable functions (ray-project#1895) Multithreading refactor for ObjectManager. (ray-project#1911) Add slice functionality (ray-project#1832) [DataFrame] Pass read_csv kwargs to _infer_column (ray-project#1894) Addresses missed comments from multichunk object transfer PR. (ray-project#1908) Allow numpy arrays to be passed by value into tasks (and inlined in the task spec). (ray-project#1816) [xray] Lineage cache requests notifications from the GCS about remote tasks (ray-project#1834) Fix UI issue for non-json-serializable task arguments. (ray-project#1892) Remove unnecessary calls to .hex() for object IDs. (ray-project#1910) Allow multiple raylets to be started on a single machine. (ray-project#1904) # Conflicts: # python/ray/rllib/__init__.py # python/ray/rllib/dqn/dqn.py
* master: updates (ray-project#1958) Pin Cython in autoscaler development example. (ray-project#1951) Incorporate C++ Buffer management and Seal global threadpool fix from arrow (ray-project#1950) [XRay] Add consistency check for protocol between node_manager and local_scheduler_client (ray-project#1944) Remove smart_open install. (ray-project#1943) [DataFrame] Fully implement append, concat and join (ray-project#1932) [DataFrame] Fix for __getitem__ string indexing (ray-project#1939) [DataFrame] Implementing write methods (ray-project#1918) [rllib] arr[end] was excluded when end is not None (ray-project#1931) [DataFrame] Implementing API correct groupby with aggregation methods (ray-project#1914) Handle interrupts correctly for ASIO synchronous reads and writes. (ray-project#1929) [DataFrame] Adding read methods and tests (ray-project#1712) Allow task_table_update to fail when tasks are finished. (ray-project#1927) [rllib] Contribute DDPG to RLlib (ray-project#1877) [xray] Workers blocked in a `ray.get` release their resources (ray-project#1920) Raylet task dispatch and throttling worker startup (ray-project#1912) [DataFrame] Eval fix (ray-project#1903)
What do these changes do?
Implemented DDPG (see ./rllib/ddpg) in a consistent style with the DQN:
Validated on Pendulum-v0 and MountainCarContinuous-v0 with LocalSyncReplayOptimizer and ApeXOptimizer:
Using LocalSyncReplayOptimizer on Pendulum-v0 (see ./rllib/tuned_examples/pendulum-ddpg.yaml) and mean100rewards reaches -160 in around 30k to 40k timesteps

Using ApeXOptimizer on Pendulum-v0 (see ./rllib/tuned_examples/pendulum-apex-ddpg.yaml) and mean100reward reaches -160 within around 15mins with 16 workers

Using LocalSyncReplayOptimizer on MountainCarContinuous-v0 (see ./rllib/tuned_examples/mountaincarcontinuous-ddpg.yaml) and mean100rewards reaches 90 in around 20k to 30k timesteps

Using ApeXOptimizer on MountainCarContinuous-v0 (see ./rllib/tuned_examples/mountaincarcontinuous-apex-ddpg.yaml) and mean100reward reaches 90 within around 15mins with 16 workers

Some functionalities e.g., OU process for generating noise, Schedulers, etc. can be refactored as common utilities (i.e., put them in ./rllib/utils). However, we want to keep each pull-request clean and specific for one function.
Related issue number
#1868