Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rllib] Move a3c implementation from examples/ to python/ray/rllib/ #698

Merged
merged 16 commits into from
Jun 29, 2017

Conversation

ericl
Copy link
Contributor

@ericl ericl commented Jun 26, 2017

This moves examples/a3c to the rllib dir, and also refactors it slightly to conform to the ray.rllib.common.Algorithm interface.

Notably, each train() call will return intermediate training results after 100 rollouts (configurable) have been processed. I don't think this affects the semantics of the algorithm, but it would be good to have a more careful review of the changes there.

cc @richardliaw @pcmoritz

@@ -30,10 +30,11 @@ def process_rollout(rollout, gamma, lambda_=1.0):

features = rollout.features[0]
return Batch(batch_si, batch_a, batch_adv, batch_r, rollout.terminal,
features)
features, np.sum(rewards))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add this in order to report mean reward as part of TrainingResult.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1131/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1132/
Test PASSed.

with self.sess.as_default():
self._run()
except BaseException as e:
self.queue.put(e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propagates the exception to the driver.

length += 1
rewards += reward
if length >= timestep_limit:
terminal = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously these weren't marked as terminal, which I think could cause reward values to be incorrectly totaled later on.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1137/
Test PASSed.

@ericl
Copy link
Contributor Author

ericl commented Jun 28, 2017

Ping

@robertnishihara
Copy link
Collaborator

Looks good to me. Looks like linting is failing on Travis with one minor error.

@pcmoritz
Copy link
Contributor

Yeah looks good! Let's fix the linting and merge it.

@ericl
Copy link
Contributor Author

ericl commented Jun 29, 2017 via email

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/1140/
Test PASSed.

@pcmoritz pcmoritz merged commit 2d81edf into ray-project:master Jun 29, 2017
@robertnishihara robertnishihara deleted the rllib-a3c branch July 7, 2017 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants