-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Improve performance for small rollouts #812
[rllib] Improve performance for small rollouts #812
Conversation
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
032fc5f
to
71930dc
Compare
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
…into pg-multinode-fixes Conflicts: python/ray/rllib/policy_gradient/agent.py
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
def compute_steps(self, gamma, lam, horizon, min_steps_per_task=-1): | ||
"""Compute multiple rollouts and concatenate the results. | ||
|
||
Parameters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be Args:
|
||
Parameters: | ||
num_steps: Lower bound on the number of states to be collected. | ||
Returns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a newline before Returns:
Merged build finished. Test PASSed. |
Test PASSed. |
"kl_coeff": 0.2, | ||
# Number of SGD iteration in each outer loop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iterations
"""Compute multiple rollouts and concatenate the results. | ||
|
||
Parameters: | ||
num_steps: Lower bound on the number of states to be collected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These args are not the actual names of the args
trajectory = rollouts( | ||
self.common_policy, | ||
self.env, horizon, self.observation_filter, self.reward_filter) | ||
add_advantage_values(trajectory, gamma, lam, self.reward_filter) | ||
return trajectory | ||
|
||
def compute_steps(self, gamma, lam, horizon, min_steps_per_task=-1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to define this method in terms of compute_trajectory
?
num_timesteps_so_far = 0 | ||
trajectories = [] | ||
total_rewards = [] | ||
traj_len_means = [] | ||
traj_lengths = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slight preference for trajectory_lengths
num_steps_so_far = 0 | ||
trajectories = [] | ||
total_rewards = [] | ||
traj_lengths = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slight preference for trajectory_lengths
traj_lengths.append(np.logical_not(trajectory["dones"]).sum(axis=0).mean()) | ||
trajectory = flatten(trajectory) | ||
not_done = np.logical_not(trajectory["dones"]) | ||
trajectory = {key: val[not_done] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment saying that we're filtering out the environments that are finished
Merged build finished. Test PASSed. |
Test PASSed. |
1240642
to
3994256
Compare
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
Merged build finished. Test PASSed. |
Test PASSed. |
No description provided.