Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Importance Sampling and KL Loss for APPO #5051

Merged
merged 27 commits into from
Jul 29, 2019

Conversation

michaelzhiluo
Copy link
Contributor

@michaelzhiluo michaelzhiluo commented Jun 27, 2019

What do these changes do?

There are two improvements to APPO:

  1. Importance Sampling with respect to a target network, called old_worker in the code. Importance sampling w.r to the surrogate loss helps with continuous environments and makes learning a lot more stable.
  2. Added optional KL Loss like in PPO, which also helps with continuous environments

Related issue number

Linter

  • I've run scripts/format.sh to lint the changes in this PR.

@michaelzhiluo michaelzhiluo changed the title Is appo Significant APPO Improvements Jun 27, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14941/
Test FAILed.

@ericl ericl self-assigned this Jun 27, 2019
@ericl
Copy link
Contributor

ericl commented Jun 27, 2019

Any benchmark results?

@michaelzhiluo
Copy link
Contributor Author

Hi Eric, if you run pong-impala.yaml and pong-appo.yaml (similar configurations), you'll see that APPO does a lot better. At the same time, there is also a new yaml file called halfcheetah-appo.yaml, which gets to 9k reward in 3 hours or so. This could most likely be improved if the bug regarding Impala's performance is found.

@michaelzhiluo michaelzhiluo changed the title Significant APPO Improvements [rllib] Significant APPO Improvements Jun 27, 2019
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15218/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15220/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1526/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1528/
Test FAILed.

@alex-petrenko
Copy link
Contributor

I tested this patch and it worked great for me!
Unlike the previous version, it matched PPO in sample efficiency while being like 2x faster. Great work!
Can we get this in master?
https://youtu.be/UL5ixDDfv-I

@alex-petrenko
Copy link
Contributor

alex-petrenko commented Jul 11, 2019

For some reason, I cannot run APPO in local mode (for debugging).
This is the error I get:

2019-07-11 15:01:53,346	ERROR ray_trial_executor.py:211 -- Error starting runner for Trial CUSTOM_APPO_doom_dwango5_bots_0
Traceback (most recent call last):
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 209, in start_trial
    self._start_trial(trial, checkpoint)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 151, in _start_trial
    self._train(trial)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 119, in _train
    remote = trial.runner.train.remote()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 148, in remote
    return self._remote(args, kwargs)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 169, in _remote
    return invocation(args, kwargs)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 163, in invocation
    num_return_vals=num_return_vals)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 533, in _actor_method_call
    method_name)(*copy.deepcopy(args))
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 368, in train
    raise e
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 354, in train
    result = Trainable.train(self)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/trainable.py", line 154, in train
    result = self._train()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 126, in _train
    fetches = self.optimizer.step()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/async_samples_optimizer.py", line 139, in step
    sample_timesteps, train_timesteps = self._step()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/async_samples_optimizer.py", line 181, in _step
    for train_batch in self.aggregator.iter_train_batches():
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/aso_aggregator.py", line 103, in iter_train_batches
    blocking_wait=True, max_yield=max_yield)):
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/aso_aggregator.py", line 150, in _augment_with_replay
    for ev, sample_batch in sample_futures:
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/utils/actors.py", line 43, in completed_prefetch
    plasma_id = ray.pyarrow.plasma.ObjectID(obj_id.binary())
AttributeError: 'SampleBatch' object has no attribute 'binary'

Could that be because of the incompatibility between versions of Ray and RLlib?
I could not build the wheel from this particular version because of some Bazel error, so I followed this instruction instead: https://ray.readthedocs.io/en/latest/rllib-dev.html#development-install
My "base" wheel version is 0.8.0.dev1

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15623/
Test FAILed.

@richardliaw
Copy link
Contributor

jenkins retest this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15625/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15633/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15637/
Test FAILed.

@michaelzhiluo
Copy link
Contributor Author

jenkins retest this please

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1790/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1797/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1806/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1811/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1813/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15639/
Test FAILed.

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add more documentation on why a target network is being used here?

@michaelzhiluo
Copy link
Contributor Author

@ericl Target Documentation pushed

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15662/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15688/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15748/
Test PASSed.

@richardliaw richardliaw merged commit 1337c98 into ray-project:master Jul 29, 2019
edoakes pushed a commit to edoakes/ray that referenced this pull request Aug 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants