Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRPO agent #204

Merged
merged 32 commits into from
Mar 15, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
c9fbaa4
Add chainerrl.misc.conjugate_gradient
muupan Dec 14, 2017
c99d554
Add tests for conjugate_gradient
muupan Dec 14, 2017
fea10ec
Add TRPO agent
muupan Dec 16, 2017
89ccaac
Check return type of conjugate_gradient
muupan Dec 16, 2017
b215013
Improve docstring of envs.ABC
muupan Dec 16, 2017
630ff06
Add a TRPO example for gym
muupan Dec 16, 2017
1b4a68c
Use policies.FCGaussianPolicyWithStateIndependentCovariance for tests
muupan Dec 17, 2017
3bbb159
Simplify code
muupan Dec 17, 2017
f8d4f74
Check if the comuptation graph contains old-style functions
muupan Dec 17, 2017
3b1513a
Set entropy_coef=0
muupan Dec 17, 2017
114fefc
It doesn't work with 3.0.0 because of insufficient support of
muupan Dec 17, 2017
fd4ce71
Parameterize variance as log std
muupan Dec 17, 2017
96647cd
Allow saved attributes to be None
muupan Dec 18, 2017
6bfe868
Add obs_normalizer and conjugate_gradient_max_iter
muupan Dec 18, 2017
2795155
Use settings of http://arxiv.org/abs/1709.06560
muupan Dec 18, 2017
77a2871
Update on stop_episode_and_train as well as act_and_train
muupan Dec 18, 2017
7e8f3ba
Add --trpo-update-interval
muupan Dec 18, 2017
1fa5a5e
Add train_trpo_gym.py to test_examples.sh
muupan Dec 18, 2017
90f39d2
Merge branch 'master' into trpo
muupan Dec 28, 2017
fde05d8
Use different seeds for train and test envs
muupan Dec 28, 2017
396b947
Merge branch 'master' into trpo
muupan Feb 13, 2018
65f3524
Use exp(2*x) instead of exp(x)**2
muupan Feb 13, 2018
514c0f3
Test with differnet dtypes
muupan Feb 13, 2018
4cdc3a7
Remove unnecessary transpose
muupan Feb 13, 2018
0534d73
Remove unnecessary dataset_iter.reset()
muupan Feb 13, 2018
77b3198
Compute CG answer without inv_mat
muupan Feb 13, 2018
41b6f64
Use chainer.grad and raise an error for None grads
muupan Mar 14, 2018
c564ca3
Fix style of long string literals
muupan Mar 14, 2018
c65278a
Check xp consistency
muupan Mar 14, 2018
06ad59e
Use pkg_resources.parse_version to handle rc and b
muupan Mar 14, 2018
ed600de
Fix a flake8 error
muupan Mar 15, 2018
4d4c1cc
Merge branch 'master' into trpo
muupan Mar 15, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Parameterize variance as log std
  • Loading branch information
muupan committed Dec 17, 2017
commit fd4ce710ad9ab80f4e8cf64225649209cb27eb44
16 changes: 11 additions & 5 deletions chainerrl/policies/gaussian_policy.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,13 +145,20 @@ class FCGaussianPolicyWithStateIndependentCovariance(
'spherical' or 'diagonal'.
nonlinearity (callable): Nonlinearity placed between layers.
mean_wscale (float): Scale of weight initialization of the mean layer.
var_func (callable): Callable that computes the variance from the var
parameter. It should always return positive values.
var_param_init (float): Initial value the var parameter.
"""

def __init__(self, n_input_channels, action_size,
n_hidden_layers=0, n_hidden_channels=None,
min_action=None, max_action=None, bound_mean=False,
var_type='spherical',
nonlinearity=F.relu, mean_wscale=1):
nonlinearity=F.relu,
mean_wscale=1,
var_func=F.softplus,
var_param_init=0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the name var_param_init suggests the correspondence to the names var_wscale and var_bias of FCGaussianPolicy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe var is more consistent with var_bias, while less informative. Do you think var is better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this param does not represent variance, but it represents values that are converted to variance via var_func, so I added _param. I admit it is still confusing, but I didn't come up with a better name. Any suggestion?

):

self.n_input_channels = n_input_channels
self.action_size = action_size
Expand All @@ -161,6 +168,7 @@ def __init__(self, n_input_channels, action_size,
self.max_action = max_action
self.bound_mean = bound_mean
self.nonlinearity = nonlinearity
self.var_func = var_func
var_size = {'spherical': 1, 'diagonal': action_size}[var_type]

layers = []
Expand All @@ -182,13 +190,11 @@ def __init__(self, n_input_channels, action_size,
with self.init_scope():
self.hidden_layers = links.Sequence(*layers)
self.var_param = chainer.Parameter(
initializer=0.0, shape=(var_size,))
initializer=var_param_init, shape=(var_size,))

def __call__(self, x):
mean = self.hidden_layers(x)
var = F.broadcast_to(
F.softplus(self.var_param),
mean.shape)
var = F.broadcast_to(self.var_func(self.var_param), mean.shape)
return distribution.GaussianDistribution(mean, var)


Expand Down
2 changes: 2 additions & 0 deletions examples/gym/train_trpo_gym.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,8 @@ def make_env(test):
mean_wscale=0.01,
nonlinearity=F.tanh,
var_type='diagonal',
var_func=lambda x: F.exp(x) ** 2, # Parameterize log std
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F.exp(2 * x) could be faster.

var_param_init=0, # log std = 0 => std = 1
)
elif isinstance(action_space, gym.spaces.Discrete):
# Use a Softmax policy for discrete action spaces
Expand Down