Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Results and code for cartpole #15

Merged
merged 1 commit into from
Jul 16, 2016
Merged

Results and code for cartpole #15

merged 1 commit into from
Jul 16, 2016

Conversation

kvfrans
Copy link
Contributor

@kvfrans kvfrans commented Jul 14, 2016

Writeup and code implementing random search, hill climbing, and policy gradient on the cartpole environment

@ilyasu123
Copy link
Contributor

Nice work. Few comments:

  • The first reference to eligibility "eligibility = tf.log(good_probabilities)" -- it's not really the eligibility until you multiply it by the advantage. I recommend calling it something else.
  • The second reference to eligibility:
    "```
    def policy_gradient():

    [not shown: policy gradient code from before]

    advantages = tf.placeholder("float",[None,1])

    insert the elementwise multiplication by advantages

    eligibility = tf.log(good_probabilities) * advantages
is hard to connect to the previous code.  In particular, it's not obvious that you differentiate eligibility, at least not form a shallow reading.   The second point is more important, and makes it possible to change the name of the first eligibility.   Basically, don't shy away form a bit of code duplication in the explanation to your solution.   Otherwise, it's good.

Let me know when it's done and I'll merge it.

@kvfrans
Copy link
Contributor Author

kvfrans commented Jul 14, 2016

Thanks for the feedback, I updated the post.

@ilyasu123 ilyasu123 merged commit 619bb2e into openai:master Jul 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants