-
Notifications
You must be signed in to change notification settings - Fork 3k
Adds new curriculum mdp that allows modification on any environment parameters #2777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@jtigue-bdai Feel free to view and provide some feedback |
jtigue-bdai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @ooctipus, we don't currently have tests for mdp terms but do you think you could put together a unit test for this? Because it has the potential for touching so many things I think it would be good to get some unit tests for it.
e28803f to
8957e93
Compare
8957e93 to
60a0b87
Compare
jtigue-bdai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good Octi, just a rogue newline.
…m_cfg and environment
7342759 to
fec96ca
Compare
|
@kellyguo11 Documentation ready for viz |
| params={ | ||
| "term_name": "sparse_reward", | ||
| "weight": 0.5, | ||
| "num_steps": 100_000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we explain what the _000 means?
Co-authored-by: Kelly Guo <kellyg@nvidia.com> Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
…arameters (isaac-sim#2777) # Description This PR created two curriculum mdp that can change any parameter in env instance. namely `modify_term_cfg` and `modify_env_param`. `modify_env_param` is a more general version that can override any value belongs to env, but requires user to know the full path to the value. `modify_term_cfg` only work with manager_term, but is a more user friendly version that simplify path specification, for example, instead of write "observation_manager.cfg.policy.joint_pos.noise", you instead write "observations.policy.joint_pos.noise", consistent with hydra overriding style Besides path to value is needed, modify_fn, modify_params is also needed for telling the term how to modify. Demo 1: difficulty-adaptive modification for all python native data type ``` # iv -> initial value, fv -> final value def initial_final_interpolate_fn(env: ManagerBasedRLEnv, env_id, data, iv, fv, get_fraction): iv_, fv_ = torch.tensor(iv, device=env.device), torch.tensor(fv, device=env.device) fraction = eval(get_fraction) new_val = fraction * (fv_ - iv_) + iv_ if isinstance(data, float): return new_val.item() elif isinstance(data, int): return int(new_val.item()) elif isinstance(data, (tuple, list)): raw = new_val.tolist() # assume iv is sequence of all ints or all floats: is_int = isinstance(iv[0], int) casted = [int(x) if is_int else float(x) for x in raw] return tuple(casted) if isinstance(data, tuple) else casted else: raise TypeError(f"Does not support the type {type(data)}") ``` (float) ``` joint_pos_unoise_min_adr = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "observations.policy.joint_pos.noise.n_min", "modify_fn": initial_final_interpolate_fn, "modify_params": {"iv": 0., "fv": -.1, "get_fraction": "env.command_manager.get_command("difficulty")"} } ) ``` (tuple or list) ``` command_object_pose_xrange_adr = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "commands.object_pose.ranges.pos_x", "modify_fn": initial_final_interpolate_fn, "modify_params": {"iv": (-.5, -.5), "fv": (-.75, -.25), "get_fraction": "env.command_manager.get_command("difficulty")"} } ) ``` Demo 3: overriding entire term on env_step counter rather than adaptive ``` def value_override(env: ManagerBasedRLEnv, env_id, data, new_val, num_steps): if env.common_step_counter > num_steps: return new_val return mdp.modify_term_cfg.NO_CHANGE object_pos_curriculum = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "commands.object_pose", "modify_fn": value_override, "modify_params": {"new_val": <new_observation_term>, "num_step": 120000 } } ) ``` Demo 4: overriding Tensor field within some arbitary class not visible from term_cfg (you can see that 'address' is not as nice as mdp.modify_term_cfg) ``` def resample_bucket_range(env: ManagerBasedRLEnv, env_id, data, static_friction_range, dynamic_friction_range, restitution_range, num_steps): if env.common_step_counter > num_steps: range_list = [static_friction_range, dynamic_friction_range, restitution_range] ranges = torch.tensor(range_list, device="cpu") new_buckets = math_utils.sample_uniform(ranges[:, 0], ranges[:, 1], (len(data), 3), device="cpu") return new_buckets return mdp.modify_env_param.NO_CHANGE object_physics_material_curriculum = CurrTerm( func=mdp.modify_env_param, params={ "address": "event_manager.cfg.object_physics_material.func.material_buckets", "modify_fn": resample_bucket_range, "modify_params": {"static_friction_range": [.5, 1.], "dynamic_friction_range": [.3, 1.], "restitution_range": [0.0, 0.5], "num_step": 120000 } } ) ``` ## Type of change <!-- As you go through the list, delete the ones that are not applicable. --> - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task --> --------- Signed-off-by: ooctipus <zhengyuz@nvidia.com> Signed-off-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
…arameters (isaac-sim#2777) # Description This PR created two curriculum mdp that can change any parameter in env instance. namely `modify_term_cfg` and `modify_env_param`. `modify_env_param` is a more general version that can override any value belongs to env, but requires user to know the full path to the value. `modify_term_cfg` only work with manager_term, but is a more user friendly version that simplify path specification, for example, instead of write "observation_manager.cfg.policy.joint_pos.noise", you instead write "observations.policy.joint_pos.noise", consistent with hydra overriding style Besides path to value is needed, modify_fn, modify_params is also needed for telling the term how to modify. Demo 1: difficulty-adaptive modification for all python native data type ``` # iv -> initial value, fv -> final value def initial_final_interpolate_fn(env: ManagerBasedRLEnv, env_id, data, iv, fv, get_fraction): iv_, fv_ = torch.tensor(iv, device=env.device), torch.tensor(fv, device=env.device) fraction = eval(get_fraction) new_val = fraction * (fv_ - iv_) + iv_ if isinstance(data, float): return new_val.item() elif isinstance(data, int): return int(new_val.item()) elif isinstance(data, (tuple, list)): raw = new_val.tolist() # assume iv is sequence of all ints or all floats: is_int = isinstance(iv[0], int) casted = [int(x) if is_int else float(x) for x in raw] return tuple(casted) if isinstance(data, tuple) else casted else: raise TypeError(f"Does not support the type {type(data)}") ``` (float) ``` joint_pos_unoise_min_adr = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "observations.policy.joint_pos.noise.n_min", "modify_fn": initial_final_interpolate_fn, "modify_params": {"iv": 0., "fv": -.1, "get_fraction": "env.command_manager.get_command("difficulty")"} } ) ``` (tuple or list) ``` command_object_pose_xrange_adr = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "commands.object_pose.ranges.pos_x", "modify_fn": initial_final_interpolate_fn, "modify_params": {"iv": (-.5, -.5), "fv": (-.75, -.25), "get_fraction": "env.command_manager.get_command("difficulty")"} } ) ``` Demo 3: overriding entire term on env_step counter rather than adaptive ``` def value_override(env: ManagerBasedRLEnv, env_id, data, new_val, num_steps): if env.common_step_counter > num_steps: return new_val return mdp.modify_term_cfg.NO_CHANGE object_pos_curriculum = CurrTerm( func=mdp.modify_term_cfg, params={ "address": "commands.object_pose", "modify_fn": value_override, "modify_params": {"new_val": <new_observation_term>, "num_step": 120000 } } ) ``` Demo 4: overriding Tensor field within some arbitary class not visible from term_cfg (you can see that 'address' is not as nice as mdp.modify_term_cfg) ``` def resample_bucket_range(env: ManagerBasedRLEnv, env_id, data, static_friction_range, dynamic_friction_range, restitution_range, num_steps): if env.common_step_counter > num_steps: range_list = [static_friction_range, dynamic_friction_range, restitution_range] ranges = torch.tensor(range_list, device="cpu") new_buckets = math_utils.sample_uniform(ranges[:, 0], ranges[:, 1], (len(data), 3), device="cpu") return new_buckets return mdp.modify_env_param.NO_CHANGE object_physics_material_curriculum = CurrTerm( func=mdp.modify_env_param, params={ "address": "event_manager.cfg.object_physics_material.func.material_buckets", "modify_fn": resample_bucket_range, "modify_params": {"static_friction_range": [.5, 1.], "dynamic_friction_range": [.3, 1.], "restitution_range": [0.0, 0.5], "num_step": 120000 } } ) ``` ## Type of change <!-- As you go through the list, delete the ones that are not applicable. --> - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there <!-- As you go through the checklist above, you can mark something as done by putting an x character in it For example, - [x] I have done this task - [ ] I have not done this task --> --------- Signed-off-by: ooctipus <zhengyuz@nvidia.com> Signed-off-by: Kelly Guo <kellyg@nvidia.com> Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Description
This PR created two curriculum mdp that can change any parameter in env instance.
namely
modify_term_cfgandmodify_env_param.modify_env_paramis a more general version that can override any value belongs to env, but requires user to know the full path to the value.modify_term_cfgonly work with manager_term, but is a more user friendly version that simplify path specification, for example, instead of write "observation_manager.cfg.policy.joint_pos.noise", you instead write "observations.policy.joint_pos.noise", consistent with hydra overriding styleBesides path to value is needed, modify_fn, modify_params is also needed for telling the term how to modify.
Demo 1: difficulty-adaptive modification for all python native data type
(float)
(tuple or list)
Demo 3: overriding entire term on env_step counter rather than adaptive
Demo 4: overriding Tensor field within some arbitary class not visible from term_cfg
(you can see that 'address' is not as nice as mdp.modify_term_cfg)
Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there