Skip to content

Conversation

@ooctipus
Copy link
Collaborator

@ooctipus ooctipus commented Jun 26, 2025

Description

This PR created two curriculum mdp that can change any parameter in env instance.
namely modify_term_cfg and modify_env_param.

modify_env_param is a more general version that can override any value belongs to env, but requires user to know the full path to the value.

modify_term_cfg only work with manager_term, but is a more user friendly version that simplify path specification, for example, instead of write "observation_manager.cfg.policy.joint_pos.noise", you instead write "observations.policy.joint_pos.noise", consistent with hydra overriding style

Besides path to value is needed, modify_fn, modify_params is also needed for telling the term how to modify.

Demo 1: difficulty-adaptive modification for all python native data type

# iv -> initial value, fv -> final value
def initial_final_interpolate_fn(env: ManagerBasedRLEnv, env_id, data, iv, fv, get_fraction):
    iv_, fv_ = torch.tensor(iv, device=env.device), torch.tensor(fv, device=env.device)
    fraction = eval(get_fraction)
    new_val = fraction * (fv_ - iv_) + iv_
    if isinstance(data, float):
        return new_val.item()
    elif isinstance(data, int):
        return int(new_val.item())
    elif isinstance(data, (tuple, list)):
        raw = new_val.tolist()
        # assume iv is sequence of all ints or all floats:
        is_int = isinstance(iv[0], int)
        casted = [int(x) if is_int else float(x) for x in raw]
        return tuple(casted) if isinstance(data, tuple) else casted
    else:
        raise TypeError(f"Does not support the type {type(data)}")

(float)

    joint_pos_unoise_min_adr = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "observations.policy.joint_pos.noise.n_min",
            "modify_fn": initial_final_interpolate_fn,
            "modify_params": {"iv": 0., "fv": -.1, "get_fraction": "env.command_manager.get_command("difficulty")"}
        }
    )

(tuple or list)

command_object_pose_xrange_adr = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "commands.object_pose.ranges.pos_x",
            "modify_fn": initial_final_interpolate_fn,
            "modify_params": {"iv": (-.5, -.5), "fv": (-.75, -.25), "get_fraction": "env.command_manager.get_command("difficulty")"}
        }
    )

Demo 3: overriding entire term on env_step counter rather than adaptive

def value_override(env: ManagerBasedRLEnv, env_id, data, new_val, num_steps):
    if env.common_step_counter > num_steps:
        return new_val
    return mdp.modify_term_cfg.NO_CHANGE

object_pos_curriculum = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "commands.object_pose",
            "modify_fn": value_override,
            "modify_params": {"new_val": <new_observation_term>, "num_step": 120000 }
        }
    )

Demo 4: overriding Tensor field within some arbitary class not visible from term_cfg
(you can see that 'address' is not as nice as mdp.modify_term_cfg)

def resample_bucket_range(env: ManagerBasedRLEnv, env_id, data, static_friction_range, dynamic_friction_range, restitution_range, num_steps):
    if env.common_step_counter > num_steps:
          range_list = [static_friction_range, dynamic_friction_range, restitution_range]
          ranges = torch.tensor(range_list, device="cpu")
          new_buckets = math_utils.sample_uniform(ranges[:, 0], ranges[:, 1], (len(data), 3), device="cpu")
          return new_buckets
    return mdp.modify_env_param.NO_CHANGE

object_physics_material_curriculum = CurrTerm(
        func=mdp.modify_env_param,
        params={
            "address": "event_manager.cfg.object_physics_material.func.material_buckets",
            "modify_fn": resample_bucket_range,
            "modify_params": {"static_friction_range": [.5, 1.], "dynamic_friction_range": [.3, 1.], "restitution_range": [0.0, 0.5], "num_step": 120000 }
        }
    )

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@ooctipus
Copy link
Collaborator Author

@jtigue-bdai Feel free to view and provide some feedback

Copy link
Collaborator

@jtigue-bdai jtigue-bdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @ooctipus, we don't currently have tests for mdp terms but do you think you could put together a unit test for this? Because it has the potential for touching so many things I think it would be good to get some unit tests for it.

@ooctipus ooctipus requested a review from pascal-roth as a code owner June 27, 2025 19:34
@ooctipus ooctipus force-pushed the feat/modify_env_param_curriculum branch from e28803f to 8957e93 Compare June 27, 2025 19:35
@ooctipus ooctipus force-pushed the feat/modify_env_param_curriculum branch from 8957e93 to 60a0b87 Compare June 27, 2025 21:27
Copy link
Collaborator

@jtigue-bdai jtigue-bdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Octi, just a rogue newline.

@ooctipus ooctipus force-pushed the feat/modify_env_param_curriculum branch from 7342759 to fec96ca Compare July 2, 2025 06:47
@ooctipus
Copy link
Collaborator Author

ooctipus commented Jul 2, 2025

@kellyguo11 Documentation ready for viz

params={
"term_name": "sparse_reward",
"weight": 0.5,
"num_steps": 100_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we explain what the _000 means?

ooctipus and others added 4 commits July 8, 2025 23:54
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
@kellyguo11 kellyguo11 changed the title Added new curriculum mdp that allows modification on any environment parameters Adds new curriculum mdp that allows modification on any environment parameters Jul 10, 2025
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
@kellyguo11 kellyguo11 merged commit cee5027 into main Jul 12, 2025
3 of 4 checks passed
@kellyguo11 kellyguo11 deleted the feat/modify_env_param_curriculum branch July 12, 2025 01:14
harry-wf-cheung pushed a commit to harry-wf-cheung/IsaacLab-Harry that referenced this pull request Jul 30, 2025
…arameters (isaac-sim#2777)

# Description

This PR created two curriculum mdp that can change any parameter in env
instance.
namely `modify_term_cfg` and `modify_env_param`.

`modify_env_param` is a more general version that can override any value
belongs to env, but requires user to know the full path to the value.

`modify_term_cfg` only work with manager_term, but is a more user
friendly version that simplify path specification, for example, instead
of write "observation_manager.cfg.policy.joint_pos.noise", you instead
write "observations.policy.joint_pos.noise", consistent with hydra
overriding style

Besides path to value is needed, modify_fn, modify_params is also needed
for telling the term how to modify.



Demo 1: difficulty-adaptive modification for all python native data type
```
# iv -> initial value, fv -> final value
def initial_final_interpolate_fn(env: ManagerBasedRLEnv, env_id, data, iv, fv, get_fraction):
    iv_, fv_ = torch.tensor(iv, device=env.device), torch.tensor(fv, device=env.device)
    fraction = eval(get_fraction)
    new_val = fraction * (fv_ - iv_) + iv_
    if isinstance(data, float):
        return new_val.item()
    elif isinstance(data, int):
        return int(new_val.item())
    elif isinstance(data, (tuple, list)):
        raw = new_val.tolist()
        # assume iv is sequence of all ints or all floats:
        is_int = isinstance(iv[0], int)
        casted = [int(x) if is_int else float(x) for x in raw]
        return tuple(casted) if isinstance(data, tuple) else casted
    else:
        raise TypeError(f"Does not support the type {type(data)}")
```
(float)
```
    joint_pos_unoise_min_adr = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "observations.policy.joint_pos.noise.n_min",
            "modify_fn": initial_final_interpolate_fn,
            "modify_params": {"iv": 0., "fv": -.1, "get_fraction": "env.command_manager.get_command("difficulty")"}
        }
    )
```

(tuple or list)
```
command_object_pose_xrange_adr = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "commands.object_pose.ranges.pos_x",
            "modify_fn": initial_final_interpolate_fn,
            "modify_params": {"iv": (-.5, -.5), "fv": (-.75, -.25), "get_fraction": "env.command_manager.get_command("difficulty")"}
        }
    )
```

Demo 3: overriding entire term on env_step counter rather than adaptive
```
def value_override(env: ManagerBasedRLEnv, env_id, data, new_val, num_steps):
    if env.common_step_counter > num_steps:
        return new_val
    return mdp.modify_term_cfg.NO_CHANGE

object_pos_curriculum = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "commands.object_pose",
            "modify_fn": value_override,
            "modify_params": {"new_val": <new_observation_term>, "num_step": 120000 }
        }
    )
```

Demo 4: overriding Tensor field within some arbitary class not visible
from term_cfg
(you can see that 'address' is not as nice as mdp.modify_term_cfg)
```
def resample_bucket_range(env: ManagerBasedRLEnv, env_id, data, static_friction_range, dynamic_friction_range, restitution_range, num_steps):
    if env.common_step_counter > num_steps:
          range_list = [static_friction_range, dynamic_friction_range, restitution_range]
          ranges = torch.tensor(range_list, device="cpu")
          new_buckets = math_utils.sample_uniform(ranges[:, 0], ranges[:, 1], (len(data), 3), device="cpu")
          return new_buckets
    return mdp.modify_env_param.NO_CHANGE

object_physics_material_curriculum = CurrTerm(
        func=mdp.modify_env_param,
        params={
            "address": "event_manager.cfg.object_physics_material.func.material_buckets",
            "modify_fn": resample_bucket_range,
            "modify_params": {"static_friction_range": [.5, 1.], "dynamic_friction_range": [.3, 1.], "restitution_range": [0.0, 0.5], "num_step": 120000 }
        }
    )
```


## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- New feature (non-breaking change which adds functionality)


## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------

Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
StephenWelch pushed a commit to StephenWelch/IsaacLab that referenced this pull request Aug 17, 2025
…arameters (isaac-sim#2777)

# Description

This PR created two curriculum mdp that can change any parameter in env
instance.
namely `modify_term_cfg` and `modify_env_param`.

`modify_env_param` is a more general version that can override any value
belongs to env, but requires user to know the full path to the value.

`modify_term_cfg` only work with manager_term, but is a more user
friendly version that simplify path specification, for example, instead
of write "observation_manager.cfg.policy.joint_pos.noise", you instead
write "observations.policy.joint_pos.noise", consistent with hydra
overriding style

Besides path to value is needed, modify_fn, modify_params is also needed
for telling the term how to modify.



Demo 1: difficulty-adaptive modification for all python native data type
```
# iv -> initial value, fv -> final value
def initial_final_interpolate_fn(env: ManagerBasedRLEnv, env_id, data, iv, fv, get_fraction):
    iv_, fv_ = torch.tensor(iv, device=env.device), torch.tensor(fv, device=env.device)
    fraction = eval(get_fraction)
    new_val = fraction * (fv_ - iv_) + iv_
    if isinstance(data, float):
        return new_val.item()
    elif isinstance(data, int):
        return int(new_val.item())
    elif isinstance(data, (tuple, list)):
        raw = new_val.tolist()
        # assume iv is sequence of all ints or all floats:
        is_int = isinstance(iv[0], int)
        casted = [int(x) if is_int else float(x) for x in raw]
        return tuple(casted) if isinstance(data, tuple) else casted
    else:
        raise TypeError(f"Does not support the type {type(data)}")
```
(float)
```
    joint_pos_unoise_min_adr = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "observations.policy.joint_pos.noise.n_min",
            "modify_fn": initial_final_interpolate_fn,
            "modify_params": {"iv": 0., "fv": -.1, "get_fraction": "env.command_manager.get_command("difficulty")"}
        }
    )
```

(tuple or list)
```
command_object_pose_xrange_adr = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "commands.object_pose.ranges.pos_x",
            "modify_fn": initial_final_interpolate_fn,
            "modify_params": {"iv": (-.5, -.5), "fv": (-.75, -.25), "get_fraction": "env.command_manager.get_command("difficulty")"}
        }
    )
```

Demo 3: overriding entire term on env_step counter rather than adaptive
```
def value_override(env: ManagerBasedRLEnv, env_id, data, new_val, num_steps):
    if env.common_step_counter > num_steps:
        return new_val
    return mdp.modify_term_cfg.NO_CHANGE

object_pos_curriculum = CurrTerm(
        func=mdp.modify_term_cfg,
        params={
            "address": "commands.object_pose",
            "modify_fn": value_override,
            "modify_params": {"new_val": <new_observation_term>, "num_step": 120000 }
        }
    )
```

Demo 4: overriding Tensor field within some arbitary class not visible
from term_cfg
(you can see that 'address' is not as nice as mdp.modify_term_cfg)
```
def resample_bucket_range(env: ManagerBasedRLEnv, env_id, data, static_friction_range, dynamic_friction_range, restitution_range, num_steps):
    if env.common_step_counter > num_steps:
          range_list = [static_friction_range, dynamic_friction_range, restitution_range]
          ranges = torch.tensor(range_list, device="cpu")
          new_buckets = math_utils.sample_uniform(ranges[:, 0], ranges[:, 1], (len(data), 3), device="cpu")
          return new_buckets
    return mdp.modify_env_param.NO_CHANGE

object_physics_material_curriculum = CurrTerm(
        func=mdp.modify_env_param,
        params={
            "address": "event_manager.cfg.object_physics_material.func.material_buckets",
            "modify_fn": resample_bucket_range,
            "modify_params": {"static_friction_range": [.5, 1.], "dynamic_friction_range": [.3, 1.], "restitution_range": [0.0, 0.5], "num_step": 120000 }
        }
    )
```


## Type of change

<!-- As you go through the list, delete the ones that are not
applicable. -->

- New feature (non-breaking change which adds functionality)


## Checklist

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

<!--
As you go through the checklist above, you can mark something as done by
putting an x character in it

For example,
- [x] I have done this task
- [ ] I have not done this task
-->

---------

Signed-off-by: ooctipus <zhengyuz@nvidia.com>
Signed-off-by: Kelly Guo <kellyg@nvidia.com>
Co-authored-by: Kelly Guo <kellyg@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants