[rllib] Importance Sampling and KL Loss for APPO #5051

michaelzhiluo · 2019-06-27T20:54:24Z

What do these changes do?

There are two improvements to APPO:

Importance Sampling with respect to a target network, called old_worker in the code. Importance sampling w.r to the surrogate loss helps with continuous environments and makes learning a lot more stable.
Added optional KL Loss like in PPO, which also helps with continuous environments

Related issue number

Linter

I've run scripts/format.sh to lint the changes in this PR.

AmplabJenkins · 2019-06-27T21:46:02Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/14941/
Test FAILed.

ericl · 2019-06-27T22:02:34Z

Any benchmark results?

michaelzhiluo · 2019-06-27T23:06:59Z

Hi Eric, if you run pong-impala.yaml and pong-appo.yaml (similar configurations), you'll see that APPO does a lot better. At the same time, there is also a new yaml file called halfcheetah-appo.yaml, which gets to 9k reward in 3 hours or so. This could most likely be improved if the bug regarding Impala's performance is found.

python/ray/rllib/agents/ppo/appo.py

python/ray/rllib/agents/ppo/appo_policy.py

python/ray/rllib/policy/tf_policy.py

python/ray/rllib/agents/ppo/appo_policy.py

AmplabJenkins · 2019-07-09T02:30:46Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15218/
Test FAILed.

AmplabJenkins · 2019-07-09T03:36:22Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15220/
Test FAILed.

AmplabJenkins · 2019-07-09T05:58:59Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1526/
Test FAILed.

AmplabJenkins · 2019-07-09T06:51:13Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1528/
Test FAILed.

alex-petrenko · 2019-07-11T21:43:53Z

I tested this patch and it worked great for me!
Unlike the previous version, it matched PPO in sample efficiency while being like 2x faster. Great work!
Can we get this in master?
https://youtu.be/UL5ixDDfv-I

alex-petrenko · 2019-07-11T22:08:38Z

For some reason, I cannot run APPO in local mode (for debugging).
This is the error I get:

2019-07-11 15:01:53,346	ERROR ray_trial_executor.py:211 -- Error starting runner for Trial CUSTOM_APPO_doom_dwango5_bots_0
Traceback (most recent call last):
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 209, in start_trial
    self._start_trial(trial, checkpoint)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 151, in _start_trial
    self._train(trial)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/ray_trial_executor.py", line 119, in _train
    remote = trial.runner.train.remote()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 148, in remote
    return self._remote(args, kwargs)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 169, in _remote
    return invocation(args, kwargs)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 163, in invocation
    num_return_vals=num_return_vals)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/actor.py", line 533, in _actor_method_call
    method_name)(*copy.deepcopy(args))
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 368, in train
    raise e
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 354, in train
    result = Trainable.train(self)
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/tune/trainable.py", line 154, in train
    result = self._train()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 126, in _train
    fetches = self.optimizer.step()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/async_samples_optimizer.py", line 139, in step
    sample_timesteps, train_timesteps = self._step()
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/async_samples_optimizer.py", line 181, in _step
    for train_batch in self.aggregator.iter_train_batches():
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/aso_aggregator.py", line 103, in iter_train_batches
    blocking_wait=True, max_yield=max_yield)):
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/optimizers/aso_aggregator.py", line 150, in _augment_with_replay
    for ev, sample_batch in sample_futures:
  File "/home/apetrenk/miniconda3/envs/doom-rl/lib/python3.7/site-packages/ray/rllib/utils/actors.py", line 43, in completed_prefetch
    plasma_id = ray.pyarrow.plasma.ObjectID(obj_id.binary())
AttributeError: 'SampleBatch' object has no attribute 'binary'

Could that be because of the incompatibility between versions of Ray and RLlib?
I could not build the wheel from this particular version because of some Bazel error, so I followed this instruction instead: https://ray.readthedocs.io/en/latest/rllib-dev.html#development-install
My "base" wheel version is 0.8.0.dev1

python/ray/rllib/agents/ppo/appo.py

python/ray/rllib/tuned_examples/pong-appo.yaml

python/ray/rllib/tuned_examples/halfcheetah-appo.yaml

python/ray/rllib/policy/dynamic_tf_policy.py

python/ray/rllib/optimizers/aso_minibatch_buffer.py

python/ray/rllib/agents/ppo/appo_policy.py

AmplabJenkins · 2019-07-24T23:33:15Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15623/
Test FAILed.

richardliaw · 2019-07-24T23:58:06Z

jenkins retest this please

AmplabJenkins · 2019-07-25T01:45:45Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15625/
Test FAILed.

AmplabJenkins · 2019-07-25T05:07:06Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15633/
Test FAILed.

AmplabJenkins · 2019-07-25T07:02:59Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15637/
Test FAILed.

michaelzhiluo · 2019-07-25T07:27:54Z

jenkins retest this please

AmplabJenkins · 2019-07-25T08:37:27Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1790/
Test FAILed.

AmplabJenkins · 2019-07-25T08:39:44Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1797/
Test FAILed.

AmplabJenkins · 2019-07-25T08:42:54Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1806/
Test FAILed.

AmplabJenkins · 2019-07-25T08:44:38Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1811/
Test FAILed.

AmplabJenkins · 2019-07-25T08:45:01Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-Perf-Integration-PRB/1813/
Test FAILed.

AmplabJenkins · 2019-07-25T09:54:40Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15639/
Test FAILed.

python/ray/rllib/agents/ppo/appo.py

python/ray/rllib/agents/ppo/appo_policy.py

ericl

Can you add more documentation on why a target network is being used here?

michaelzhiluo · 2019-07-25T20:36:30Z

@ericl Target Documentation pushed

ericl

lgtm!

AmplabJenkins · 2019-07-25T22:38:28Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15662/
Test FAILed.

AmplabJenkins · 2019-07-26T21:21:47Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15688/
Test FAILed.

AmplabJenkins · 2019-07-29T21:51:25Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/15748/
Test PASSed.

Michael Luo added 4 commits June 26, 2019 20:43

IS APPO updated to newest rllib

1b50eb4

KL Loss and Stats implemented

dc50e45

Cleaned up code

35b7eb8

Pushed small ch anges

7e22482

michaelzhiluo changed the title ~~Is appo~~ Significant APPO Improvements Jun 27, 2019

ericl self-assigned this Jun 27, 2019

michaelzhiluo changed the title ~~Significant APPO Improvements~~ [rllib] Significant APPO Improvements Jun 27, 2019

ericl assigned richardliaw Jun 28, 2019

richardliaw reviewed Jul 3, 2019

View reviewed changes

python/ray/rllib/agents/ppo/appo.py Outdated Show resolved Hide resolved

richardliaw reviewed Jul 3, 2019

View reviewed changes

python/ray/rllib/agents/ppo/appo_policy.py Outdated Show resolved Hide resolved

richardliaw reviewed Jul 3, 2019

View reviewed changes

python/ray/rllib/policy/tf_policy.py Outdated Show resolved Hide resolved

python/ray/rllib/agents/ppo/appo_policy.py Outdated Show resolved Hide resolved

Merged with upstream master

b6924a5

Outsourced KLCoeffMixin to ppo_policy.py

d8dc1b3