-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TRPO agent #204
Merged
Merged
TRPO agent #204
Changes from 1 commit
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
c9fbaa4
Add chainerrl.misc.conjugate_gradient
muupan c99d554
Add tests for conjugate_gradient
muupan fea10ec
Add TRPO agent
muupan 89ccaac
Check return type of conjugate_gradient
muupan b215013
Improve docstring of envs.ABC
muupan 630ff06
Add a TRPO example for gym
muupan 1b4a68c
Use policies.FCGaussianPolicyWithStateIndependentCovariance for tests
muupan 3bbb159
Simplify code
muupan f8d4f74
Check if the comuptation graph contains old-style functions
muupan 3b1513a
Set entropy_coef=0
muupan 114fefc
It doesn't work with 3.0.0 because of insufficient support of
muupan fd4ce71
Parameterize variance as log std
muupan 96647cd
Allow saved attributes to be None
muupan 6bfe868
Add obs_normalizer and conjugate_gradient_max_iter
muupan 2795155
Use settings of http://arxiv.org/abs/1709.06560
muupan 77a2871
Update on stop_episode_and_train as well as act_and_train
muupan 7e8f3ba
Add --trpo-update-interval
muupan 1fa5a5e
Add train_trpo_gym.py to test_examples.sh
muupan 90f39d2
Merge branch 'master' into trpo
muupan fde05d8
Use different seeds for train and test envs
muupan 396b947
Merge branch 'master' into trpo
muupan 65f3524
Use exp(2*x) instead of exp(x)**2
muupan 514c0f3
Test with differnet dtypes
muupan 4cdc3a7
Remove unnecessary transpose
muupan 0534d73
Remove unnecessary dataset_iter.reset()
muupan 77b3198
Compute CG answer without inv_mat
muupan 41b6f64
Use chainer.grad and raise an error for None grads
muupan c564ca3
Fix style of long string literals
muupan c65278a
Check xp consistency
muupan 06ad59e
Use pkg_resources.parse_version to handle rc and b
muupan ed600de
Fix a flake8 error
muupan 4d4c1cc
Merge branch 'master' into trpo
muupan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Parameterize variance as log std
- Loading branch information
commit fd4ce710ad9ab80f4e8cf64225649209cb27eb44
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -108,6 +108,8 @@ def make_env(test): | |
mean_wscale=0.01, | ||
nonlinearity=F.tanh, | ||
var_type='diagonal', | ||
var_func=lambda x: F.exp(x) ** 2, # Parameterize log std | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
var_param_init=0, # log std = 0 => std = 1 | ||
) | ||
elif isinstance(action_space, gym.spaces.Discrete): | ||
# Use a Softmax policy for discrete action spaces | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the name
var_param_init
suggests the correspondence to the namesvar_wscale
andvar_bias
ofFCGaussianPolicy
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe
var
is more consistent withvar_bias
, while less informative. Do you thinkvar
is better?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this param does not represent variance, but it represents values that are converted to variance via
var_func
, so I added_param
. I admit it is still confusing, but I didn't come up with a better name. Any suggestion?