[rllib] Improve performance for small rollouts#812
[rllib] Improve performance for small rollouts#812robertnishihara merged 12 commits intoray-project:masterfrom
Conversation
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
032fc5f to
71930dc
Compare
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
…into pg-multinode-fixes Conflicts: python/ray/rllib/policy_gradient/agent.py
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
| def compute_steps(self, gamma, lam, horizon, min_steps_per_task=-1): | ||
| """Compute multiple rollouts and concatenate the results. | ||
|
|
||
| Parameters: |
There was a problem hiding this comment.
This should be Args:
|
|
||
| Parameters: | ||
| num_steps: Lower bound on the number of states to be collected. | ||
| Returns: |
There was a problem hiding this comment.
add a newline before Returns:
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
| "lambda": 1.0, | ||
| # Initial coefficient for KL divergence | ||
| "kl_coeff": 0.2, | ||
| # Number of SGD iteration in each outer loop |
| """Compute multiple rollouts and concatenate the results. | ||
|
|
||
| Parameters: | ||
| num_steps: Lower bound on the number of states to be collected. |
There was a problem hiding this comment.
These args are not the actual names of the args
| add_advantage_values(trajectory, gamma, lam, self.reward_filter) | ||
| return trajectory | ||
|
|
||
| def compute_steps(self, gamma, lam, horizon, min_steps_per_task=-1): |
There was a problem hiding this comment.
would it make sense to define this method in terms of compute_trajectory?
| trajectories = [] | ||
| total_rewards = [] | ||
| traj_len_means = [] | ||
| traj_lengths = [] |
There was a problem hiding this comment.
slight preference for trajectory_lengths
| num_steps_so_far = 0 | ||
| trajectories = [] | ||
| total_rewards = [] | ||
| traj_lengths = [] |
There was a problem hiding this comment.
slight preference for trajectory_lengths
| traj_lengths.append(np.logical_not(trajectory["dones"]).sum(axis=0).mean()) | ||
| trajectory = flatten(trajectory) | ||
| not_done = np.logical_not(trajectory["dones"]) | ||
| trajectory = {key: val[not_done] |
There was a problem hiding this comment.
add a comment saying that we're filtering out the environments that are finished
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
1240642 to
3994256
Compare
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
|
Merged build finished. Test PASSed. |
|
Test PASSed. |
No description provided.