-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize the khr-3hv environment. #49
Conversation
Add example with cartpole_discrete using stable-baseline PPO
Add a first working version of the khr-3hv robot
Finilize the Stable baseline with the cartpole env
Finilize the Stable baseline with the cartpole env
Add a first working version of the khr-3hv robot Finilize the Stable baseline with the cartpole env Add a first working version of the khr-3hv robot Finilize the Stable baseline with the cartpole env
Add example with cartpole_discrete using stable-baseline PPO
…ctions. Using stable baseline and Ray.
Better version of the khr-3hv enviroment. Reducing observations and actions
Finalize the khr-3hv environment for the GSoC 2021. - Create the environment that works with Ray and Stable-Baseline PPO. - Logging on Wandb website. - Step-to-step documentation.
Finalize the Nao environment for thr GSoC 2021
Fix path error on the khr-3hv robot readme and add reward plots
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this khr-3hv example and the nice tutorial you provided!
I think users will enjoy this example.
I've left a few comments for khr-3hv.
I didn't review cartpole & nao since the description of this PR is about khr-3h.
@@ -0,0 +1,154 @@ | |||
import numpy as np | |||
from scipy.ndimage.interpolation import shift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this unused shift
.
import ray | ||
from ray import tune | ||
from ray.tune.registry import register_env | ||
from ray.tune import grid_search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this unused import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using Ray. It is needed the tune, register_env. Regarding the ray and grid_serach, I could remove those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickKok Hi, sorry for the confusion.
I mean that we can remove grid_search
& ray
and keep tune
. Thanks!
import ray.rllib.agents.ppo as ppo | ||
from ray.tune.logger import DEFAULT_LOGGERS | ||
from ray.tune.integration.wandb import WandbLoggerCallback | ||
from ray.tune.integration.wandb import wandb_mixin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this unused import.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need the ppo and the WandbLoggerCallback to log the results on wandb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the confusion.
I think we can remove wandb_mixin
only. Thanks!
from ray.tune.registry import register_env | ||
from ray.tune import grid_search | ||
import ray.rllib.agents.ppo as ppo | ||
from ray.tune.logger import DEFAULT_LOGGERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this unused DEFAULT_LOGGERS
.
Viewpoint { | ||
fieldOfView 0.660595 | ||
orientation -0.9627454208252096 0.19381572750322326 0.1885648918873455 0.25970754728137774 | ||
position 0.22818627812370207 1.6236991612459881 10.33705776276167 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Viewpoint { | |
fieldOfView 0.660595 | |
orientation -0.9627454208252096 0.19381572750322326 0.1885648918873455 0.25970754728137774 | |
position 0.22818627812370207 1.6236991612459881 10.33705776276167 | |
} | |
Viewpoint { | |
fieldOfView 0.660595 | |
position 0.05 1.4 9 | |
} |
The viewpoint is a little bit too inclined.
Please update the gif in the readme file as well, thanks!
# Scheduler for the learning rate when using the stable baselines | ||
def linear_schedule(initial_value: float) -> Callable[[float], float]: | ||
""" | ||
Linear learning rate schedule. | ||
:param initial_value: | ||
:return: current learning rate depending on remaining progress | ||
""" | ||
if isinstance(initial_value, str): | ||
initial_value = float(initial_value) | ||
|
||
# TODO decrease the lr by .10 | ||
def func(progress_remaining: float) -> float: | ||
""" | ||
Progress will decrease from 1 (beginning) to 0 | ||
:param progress_remaining: (float) | ||
:return: (float) | ||
""" | ||
return progress_remaining * initial_value | ||
|
||
return func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is something that we could use in the future. It's for the learning rate scheduler when using Stable-Baseline PPO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I think we can keep these lines then.
model.learn(total_timesteps=100000000, callback=checkpoint_callback) | ||
else: | ||
# Load the weights of a trained Agent | ||
model = PPO.load("./results/Deepbots-khr3hv-SB-reward-weight,prev_pos,obs-x,y,z,vel-done-0.5-RNN-5,clip_range-0.2,lr-0.0001_6778500_steps") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can pass a file path in supervisorsManager.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it could be possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address this on a following PR or this PR. Thank you!
) | ||
ray.shutdown() | ||
else: | ||
# Crate an agent having the predifined model hyperparameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add ray.init()?
When TRAIN = False
and USE_Ray = True
, I get System error: Ray has not been started yet. You can start Ray with 'ray.init()'.
Adding ray.init() do fix this error.
# Crate an agent having the predifined model hyperparameters | ||
agent = ppo.PPOTrainer(config=model_config) | ||
# Restore the train weights | ||
agent.restore("./results/Deepbots-khr3hv-Ray-reward-weight,prev_pos,obs-x,y,z,vel-done-0.5-RNN-5,clip_range-0.2,lr-0.0001_6778500_steps") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can pass a file path in supervisorsManager.py
.
Then it will be visible to for end-user.
Please address this on a following PR or this PR. Thank you!
|
||
from RobotUtils import RobotFunc | ||
from deepbots.supervisor.controllers.robot_supervisor import RobotSupervisor | ||
from gym.spaces import Box, Discrete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this unused Discrete
.
from gym.spaces import Box, Discrete | |
from gym.spaces import Box |
@NickKok Thank you for contributing many examples to deepworlds! This PR includes I only do the code review on khr-3hv as the description of this PR. |
Hello @KelvinYang0320. No I didn't get any error on : AttributeError: module 'aioredis' has no attribute 'create_redis_pool' when USE_Ray = True? When you downgraded aioredis to v1.3.1, does it work with Ray? |
@NickKok Yes, everything goes fine when downgrading aioredis to v1.3.1, as suggested on this website. |
Hey @NickKok! We would like to take over this PR and finalize it if you don't mind, unless you have some free time to do so yourself. You could resolve whatever comments exist right now, merge it and we can take over perfecting it later. This needs to be closed in combination with other fixes we've been doing for #58, so as to bring all existing examples up to speed with the latest Webots release. |
Hello @tsampazk. It won't be possible to work on it till the end of August. After that It might be possibl during weekends. If you would like to continue the work feel free. I could help you from September. |
Alright @NickKok, thanks for the quick response, we will probably continue working on this and finalize it soon then. There's always more work to be done so if you feel like it you can contribute in the future 😃 |
Hey @KelvinYang0320, could we close this PR as i think all the goals were met by #76 ? |
@tsampazk There are three examples, cartpole, khr-3hv, and nao, in this PR. I think we can close this PR and open an issue for it. What do you think? |
Oh yes i didn't notice the other two, sorry. The cartpole stable-baselines stuff are already present in the dev branch 😕, so we need a new PR that adds the nao example? Bottom line i think it's better to close it and open an issue as you suggested! |
@tsampazk I have a |
I don't mind, whatever you feel is easier for you!
It's ok, by taking a quick look at the sb cartpole we need to refactor and update that example anyway! |
Finalize the khr-3hv environment for the GSoC 2021.