Results and code for cartpole #15

kvfrans · 2016-07-14T03:14:23Z

Writeup and code implementing random search, hill climbing, and policy gradient on the cartpole environment

ilyasu123 · 2016-07-14T18:50:43Z

Nice work. Few comments:

The first reference to eligibility "eligibility = tf.log(good_probabilities)" -- it's not really the eligibility until you multiply it by the advantage. I recommend calling it something else.
The second reference to eligibility:
"```
def policy_gradient():
[not shown: policy gradient code from before]
advantages = tf.placeholder("float",[None,1])
insert the elementwise multiplication by advantages
eligibility = tf.log(good_probabilities) * advantages

is hard to connect to the previous code.  In particular, it's not obvious that you differentiate eligibility, at least not form a shallow reading.   The second point is more important, and makes it possible to change the name of the first eligibility.   Basically, don't shy away form a bit of code duplication in the explanation to your solution.   Otherwise, it's good.

Let me know when it's done and I'll merge it.

kvfrans · 2016-07-14T23:00:51Z

Thanks for the feedback, I updated the post.

results and code for cartpole

0a7490c

ilyasu123 merged commit 619bb2e into openai:master Jul 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results and code for cartpole #15

Results and code for cartpole #15

kvfrans commented Jul 14, 2016

ilyasu123 commented Jul 14, 2016

[not shown: policy gradient code from before]

insert the elementwise multiplication by advantages

kvfrans commented Jul 14, 2016

Results and code for cartpole #15

Results and code for cartpole #15

Conversation

kvfrans commented Jul 14, 2016

ilyasu123 commented Jul 14, 2016

[not shown: policy gradient code from before]

insert the elementwise multiplication by advantages

kvfrans commented Jul 14, 2016