-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[rllib] Improve performance for small rollouts #812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Improve performance for small rollouts #812
Conversation
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
032fc5f to
71930dc
Compare
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
…into pg-multinode-fixes Conflicts: python/ray/rllib/policy_gradient/agent.py
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
| def compute_steps(self, gamma, lam, horizon, min_steps_per_task=-1): | ||
| """Compute multiple rollouts and concatenate the results. | ||
| Parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be Args:
| Parameters: | ||
| num_steps: Lower bound on the number of states to be collected. | ||
| Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a newline before Returns:
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
| "lambda": 1.0, | ||
| # Initial coefficient for KL divergence | ||
| "kl_coeff": 0.2, | ||
| # Number of SGD iteration in each outer loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterations
| """Compute multiple rollouts and concatenate the results. | ||
| Parameters: | ||
| num_steps: Lower bound on the number of states to be collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These args are not the actual names of the args
| add_advantage_values(trajectory, gamma, lam, self.reward_filter) | ||
| return trajectory | ||
|
|
||
| def compute_steps(self, gamma, lam, horizon, min_steps_per_task=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to define this method in terms of compute_trajectory?
| trajectories = [] | ||
| total_rewards = [] | ||
| traj_len_means = [] | ||
| traj_lengths = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slight preference for trajectory_lengths
| num_steps_so_far = 0 | ||
| trajectories = [] | ||
| total_rewards = [] | ||
| traj_lengths = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slight preference for trajectory_lengths
| traj_lengths.append(np.logical_not(trajectory["dones"]).sum(axis=0).mean()) | ||
| trajectory = flatten(trajectory) | ||
| not_done = np.logical_not(trajectory["dones"]) | ||
| trajectory = {key: val[not_done] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment saying that we're filtering out the environments that are finished
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
1240642 to
3994256
Compare
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
No description provided.