Skip to content

Conversation

@matthewtrepte
Copy link
Contributor

@matthewtrepte matthewtrepte commented Nov 22, 2025

Description

Credit to Wayne Huang + RT team for enabling these features

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions bot added isaac-sim Related to Isaac Sim team isaac-lab Related to Isaac Lab team labels Nov 22, 2025
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Nov 22, 2025

Greptile Overview

Greptile Summary

This PR adds support for three new rendering annotators (DiffuseAlbedoSD, SimpleShadingSD, and depth) to enable vision-based experiments with different input modalities in the Shadow Hand manipulation task.

Key Changes:

  • Registers two new custom AOV annotators (DiffuseAlbedoSD and SimpleShadingSD) in TiledCamera that output 4-channel data, processed down to 3-channel RGB
  • Adds 5 new gym environment variants for Shadow Hand vision task (segmentation, RGB, diffuse albedo, simple shading, depth)
  • Refactors FeatureExtractor to accept flexible input types (RGB, depth, segmentation, albedo, or simple shading) with configurable channel count
  • Updates rendering mode kit files to control color rendering via rtx.sdg.force.disableColorRender setting
  • Increases camera resolution from 120x120 to 240x240

Issues:

  • Large sections of CNN training and feature extraction code are commented out in shadow_hand_vision_env.py (lines 244-294), making the environments non-functional for actual vision-based learning
  • Observations now use raw object_pose instead of learned embeddings, which may not align with the vision-based learning intent
  • Buffer initialization pattern creates 4-channel tensors then immediately slices to 3 channels (inefficient)

Confidence Score: 2/5

  • This PR has significant concerns due to commented-out functionality that makes the new environments non-functional
  • Score reflects that while the annotator registration code is correctly implemented, the shadow_hand environment integration has extensive commented-out code that disables the CNN feature extraction, making the new vision-based environments incomplete. The observations now use raw object_pose instead of learned embeddings, which defeats the purpose of vision-based learning. These issues need resolution before merge.
  • Pay close attention to source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand/shadow_hand_vision_env.py - the commented-out code needs to be either removed or properly integrated

Important Files Changed

File Analysis

Filename Score Overview
source/isaaclab/isaaclab/sensors/camera/tiled_camera.py 3/5 Adds registration and handling for DiffuseAlbedoSD and SimpleShadingSD annotators, with proper channel processing
source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand/feature_extractor.py 2/5 Refactors feature extractor to support multiple input types with conditional logic, but has commented-out CNN training code in environment
source/isaaclab_tasks/isaaclab_tasks/direct/shadow_hand/shadow_hand_vision_env.py 1/5 Adds new environment variants for different annotators but has extensive commented-out feature extractor integration code

Sequence Diagram

sequenceDiagram
    participant Env as ShadowHandVisionEnv
    participant Camera as TiledCamera
    participant Registry as AnnotatorRegistry
    participant Annotator as Custom Annotator
    participant Extractor as FeatureExtractor
    
    Note over Camera: Initialization
    Camera->>Registry: register_annotator_from_aov("DiffuseAlbedoSD")
    Camera->>Registry: register_annotator_from_aov("SimpleShadingSD")
    Camera->>Registry: get_annotator("DiffuseAlbedoSD"/"SimpleShadingSD")
    Registry-->>Camera: annotator instance
    Camera->>Annotator: attach(render_product_paths)
    
    Note over Env: Step Execution
    Env->>Camera: update(dt)
    Camera->>Annotator: get_data()
    Annotator-->>Camera: raw 4-channel RGBA data
    Note over Camera: Process channels
    Camera->>Camera: Convert to 4-channel uint8
    Camera->>Camera: Keep only first 3 channels (RGB)
    Camera->>Camera: Reshape to tiled format
    Camera-->>Env: data.output["diffuse_albedo"/"simple_shading"]
    
    Env->>Extractor: _preprocess_images(albedo_img/simple_shading_img)
    Extractor->>Extractor: Normalize to [0, 1]
    Extractor-->>Env: preprocessed images
    
    alt write_image_to_file enabled
        Env->>Extractor: _save_images()
        Extractor->>Extractor: Save to disk
    end
    
    Note over Env: Feature extraction disabled<br/>(commented out in current PR)
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 244 to 294
# train CNN to regress on keypoint positions
# if (
# "rgb" in self.cfg.tiled_camera.data_types
# and "depth" in self.cfg.tiled_camera.data_types
# and "semantic_segmentation" in self.cfg.tiled_camera.data_types
# ):
# pose_loss, embeddings = self.feature_extractor.step(
# self._tiled_camera.data.output["rgb"],
# self._tiled_camera.data.output["depth"],
# self._tiled_camera.data.output["semantic_segmentation"][..., :3],
# object_pose,
# )
# elif "rgb" in self.cfg.tiled_camera.data_types:
# pose_loss, embeddings = self.feature_extractor.step(
# rgb_img=self._tiled_camera.data.output["rgb"], gt_pose=object_pose
# )
# elif "depth" in self.cfg.tiled_camera.data_types:
# pose_loss, embeddings = self.feature_extractor.step(
# depth_img=self._tiled_camera.data.output["depth"], gt_pose=object_pose
# )
# elif "semantic_segmentation" in self.cfg.tiled_camera.data_types:
# pose_loss, embeddings = self.feature_extractor.step(
# segmentation_img=self._tiled_camera.data.output["semantic_segmentation"][..., :3], gt_pose=object_pose
# )
# elif "diffuse_albedo" in self.cfg.tiled_camera.data_types:
# pose_loss, embeddings = self.feature_extractor.step(
# albedo_img=self._tiled_camera.data.output["diffuse_albedo"], gt_pose=object_pose
# )
# elif "simple_shading" in self.cfg.tiled_camera.data_types:
# pose_loss, embeddings = self.feature_extractor.step(
# simple_shading_img=self._tiled_camera.data.output["simple_shading"], gt_pose=object_pose
# )

# self.embeddings = embeddings.clone().detach()
# # compute keypoints for goal cube
# compute_keypoints(
# pose=torch.cat((torch.zeros_like(self.goal_pos), self.goal_rot), dim=-1), out=self.goal_keypoints
# )

obs = torch.cat(
(
self.embeddings,
# self.embeddings,
object_pose,
self.goal_keypoints.view(-1, 24),
),
dim=-1,
)

# log pose loss from CNN training
if "log" not in self.extras:
self.extras["log"] = dict()
self.extras["log"]["pose_loss"] = pose_loss
# if "log" not in self.extras:
# self.extras["log"] = dict()
# self.extras["log"]["pose_loss"] = pose_loss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Large block of commented-out code that handles CNN training and feature extraction. Either remove if no longer needed, or re-enable if this functionality is required

Comment on lines +368 to +372
if "diffuse_albedo" in self.cfg.data_types:
data_dict["diffuse_albedo"] = torch.zeros(
(self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
).contiguous()
data_dict["diffuse_albedo"] = data_dict["diffuse_albedo"][..., :3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Buffer is created with 4 channels, then immediately sliced to 3 channels. Consider initializing with 3 channels directly for clarity

Suggested change
if "diffuse_albedo" in self.cfg.data_types:
data_dict["diffuse_albedo"] = torch.zeros(
(self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
).contiguous()
data_dict["diffuse_albedo"] = data_dict["diffuse_albedo"][..., :3]
if "diffuse_albedo" in self.cfg.data_types:
data_dict["diffuse_albedo"] = torch.zeros(
(self._view.count, self.cfg.height, self.cfg.width, 3), device=self.device, dtype=torch.uint8
).contiguous()

Comment on lines +373 to +377
if "simple_shading" in self.cfg.data_types:
data_dict["simple_shading"] = torch.zeros(
(self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
).contiguous()
data_dict["simple_shading"] = data_dict["simple_shading"][..., :3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Buffer is created with 4 channels, then immediately sliced to 3 channels. Consider initializing with 3 channels directly for clarity

Suggested change
if "simple_shading" in self.cfg.data_types:
data_dict["simple_shading"] = torch.zeros(
(self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8
).contiguous()
data_dict["simple_shading"] = data_dict["simple_shading"][..., :3]
if "simple_shading" in self.cfg.data_types:
data_dict["simple_shading"] = torch.zeros(
(self._view.count, self.cfg.height, self.cfg.width, 3), device=self.device, dtype=torch.uint8
).contiguous()

Comment on lines 283 to 290
obs = torch.cat(
(
self.embeddings,
# self.embeddings,
object_pose,
self.goal_keypoints.view(-1, 24),
),
dim=-1,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Observation now directly uses object_pose instead of learned embeddings. Verify that this matches the intended observation space and algorithm expectations

@matthewtrepte matthewtrepte changed the title New Annotators for SimpleShading, DiffuseAlbedo, and Fast Depth New Annotators for SimpleShading, DiffuseAlbedo, and Fast Depth for ShadowHandEnv Nov 22, 2025
@@ -1,3 +1,5 @@
rtx.sdg.force.disableColorRender=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be set to False only if rendering is not "rgb" in the tiledcamera? Wondering if the logic belongs here or to the camera class.

)
self._render_product_paths = [rp.path]

rep.AnnotatorRegistry.register_annotator_from_aov(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments on what these are and why are they needed.

aov="DiffuseAlbedoSD", output_data_type=np.uint8, output_channels=4
)
rep.AnnotatorRegistry.register_annotator_from_aov(
aov="SimpleShadingSD", output_data_type=np.uint8, output_channels=4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should output channels be 3 directly? Later on slicing is done to that anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only for Isaac Sim 5.1?

Returns:
tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Preprocessed RGB, depth, and segmentation
tuple[torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, torch.Tensor | None]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't do type annotations on docstrings since they are harder to maintain. Please remove instances of that everywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team isaac-sim Related to Isaac Sim team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants