-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New Annotators for SimpleShading, DiffuseAlbedo, and Fast Depth for ShadowHandEnv #4066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile OverviewGreptile SummaryThis PR adds support for three new rendering annotators ( Key Changes:
Issues:
Confidence Score: 2/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Env as ShadowHandVisionEnv
participant Camera as TiledCamera
participant Registry as AnnotatorRegistry
participant Annotator as Custom Annotator
participant Extractor as FeatureExtractor
Note over Camera: Initialization
Camera->>Registry: register_annotator_from_aov("DiffuseAlbedoSD")
Camera->>Registry: register_annotator_from_aov("SimpleShadingSD")
Camera->>Registry: get_annotator("DiffuseAlbedoSD"/"SimpleShadingSD")
Registry-->>Camera: annotator instance
Camera->>Annotator: attach(render_product_paths)
Note over Env: Step Execution
Env->>Camera: update(dt)
Camera->>Annotator: get_data()
Annotator-->>Camera: raw 4-channel RGBA data
Note over Camera: Process channels
Camera->>Camera: Convert to 4-channel uint8
Camera->>Camera: Keep only first 3 channels (RGB)
Camera->>Camera: Reshape to tiled format
Camera-->>Env: data.output["diffuse_albedo"/"simple_shading"]
Env->>Extractor: _preprocess_images(albedo_img/simple_shading_img)
Extractor->>Extractor: Normalize to [0, 1]
Extractor-->>Env: preprocessed images
alt write_image_to_file enabled
Env->>Extractor: _save_images()
Extractor->>Extractor: Save to disk
end
Note over Env: Feature extraction disabled<br/>(commented out in current PR)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7 files reviewed, 4 comments
| # train CNN to regress on keypoint positions | ||
| # if ( | ||
| # "rgb" in self.cfg.tiled_camera.data_types | ||
| # and "depth" in self.cfg.tiled_camera.data_types | ||
| # and "semantic_segmentation" in self.cfg.tiled_camera.data_types | ||
| # ): | ||
| # pose_loss, embeddings = self.feature_extractor.step( | ||
| # self._tiled_camera.data.output["rgb"], | ||
| # self._tiled_camera.data.output["depth"], | ||
| # self._tiled_camera.data.output["semantic_segmentation"][..., :3], | ||
| # object_pose, | ||
| # ) | ||
| # elif "rgb" in self.cfg.tiled_camera.data_types: | ||
| # pose_loss, embeddings = self.feature_extractor.step( | ||
| # rgb_img=self._tiled_camera.data.output["rgb"], gt_pose=object_pose | ||
| # ) | ||
| # elif "depth" in self.cfg.tiled_camera.data_types: | ||
| # pose_loss, embeddings = self.feature_extractor.step( | ||
| # depth_img=self._tiled_camera.data.output["depth"], gt_pose=object_pose | ||
| # ) | ||
| # elif "semantic_segmentation" in self.cfg.tiled_camera.data_types: | ||
| # pose_loss, embeddings = self.feature_extractor.step( | ||
| # segmentation_img=self._tiled_camera.data.output["semantic_segmentation"][..., :3], gt_pose=object_pose | ||
| # ) | ||
| # elif "diffuse_albedo" in self.cfg.tiled_camera.data_types: | ||
| # pose_loss, embeddings = self.feature_extractor.step( | ||
| # albedo_img=self._tiled_camera.data.output["diffuse_albedo"], gt_pose=object_pose | ||
| # ) | ||
| # elif "simple_shading" in self.cfg.tiled_camera.data_types: | ||
| # pose_loss, embeddings = self.feature_extractor.step( | ||
| # simple_shading_img=self._tiled_camera.data.output["simple_shading"], gt_pose=object_pose | ||
| # ) | ||
|
|
||
| # self.embeddings = embeddings.clone().detach() | ||
| # # compute keypoints for goal cube | ||
| # compute_keypoints( | ||
| # pose=torch.cat((torch.zeros_like(self.goal_pos), self.goal_rot), dim=-1), out=self.goal_keypoints | ||
| # ) | ||
|
|
||
| obs = torch.cat( | ||
| ( | ||
| self.embeddings, | ||
| # self.embeddings, | ||
| object_pose, | ||
| self.goal_keypoints.view(-1, 24), | ||
| ), | ||
| dim=-1, | ||
| ) | ||
|
|
||
| # log pose loss from CNN training | ||
| if "log" not in self.extras: | ||
| self.extras["log"] = dict() | ||
| self.extras["log"]["pose_loss"] = pose_loss | ||
| # if "log" not in self.extras: | ||
| # self.extras["log"] = dict() | ||
| # self.extras["log"]["pose_loss"] = pose_loss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Large block of commented-out code that handles CNN training and feature extraction. Either remove if no longer needed, or re-enable if this functionality is required
| if "diffuse_albedo" in self.cfg.data_types: | ||
| data_dict["diffuse_albedo"] = torch.zeros( | ||
| (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8 | ||
| ).contiguous() | ||
| data_dict["diffuse_albedo"] = data_dict["diffuse_albedo"][..., :3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Buffer is created with 4 channels, then immediately sliced to 3 channels. Consider initializing with 3 channels directly for clarity
| if "diffuse_albedo" in self.cfg.data_types: | |
| data_dict["diffuse_albedo"] = torch.zeros( | |
| (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8 | |
| ).contiguous() | |
| data_dict["diffuse_albedo"] = data_dict["diffuse_albedo"][..., :3] | |
| if "diffuse_albedo" in self.cfg.data_types: | |
| data_dict["diffuse_albedo"] = torch.zeros( | |
| (self._view.count, self.cfg.height, self.cfg.width, 3), device=self.device, dtype=torch.uint8 | |
| ).contiguous() |
| if "simple_shading" in self.cfg.data_types: | ||
| data_dict["simple_shading"] = torch.zeros( | ||
| (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8 | ||
| ).contiguous() | ||
| data_dict["simple_shading"] = data_dict["simple_shading"][..., :3] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Buffer is created with 4 channels, then immediately sliced to 3 channels. Consider initializing with 3 channels directly for clarity
| if "simple_shading" in self.cfg.data_types: | |
| data_dict["simple_shading"] = torch.zeros( | |
| (self._view.count, self.cfg.height, self.cfg.width, 4), device=self.device, dtype=torch.uint8 | |
| ).contiguous() | |
| data_dict["simple_shading"] = data_dict["simple_shading"][..., :3] | |
| if "simple_shading" in self.cfg.data_types: | |
| data_dict["simple_shading"] = torch.zeros( | |
| (self._view.count, self.cfg.height, self.cfg.width, 3), device=self.device, dtype=torch.uint8 | |
| ).contiguous() |
| obs = torch.cat( | ||
| ( | ||
| self.embeddings, | ||
| # self.embeddings, | ||
| object_pose, | ||
| self.goal_keypoints.view(-1, 24), | ||
| ), | ||
| dim=-1, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Observation now directly uses object_pose instead of learned embeddings. Verify that this matches the intended observation space and algorithm expectations
| @@ -1,3 +1,5 @@ | |||
| rtx.sdg.force.disableColorRender=false | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be set to False only if rendering is not "rgb" in the tiledcamera? Wondering if the logic belongs here or to the camera class.
| ) | ||
| self._render_product_paths = [rp.path] | ||
|
|
||
| rep.AnnotatorRegistry.register_annotator_from_aov( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add comments on what these are and why are they needed.
| aov="DiffuseAlbedoSD", output_data_type=np.uint8, output_channels=4 | ||
| ) | ||
| rep.AnnotatorRegistry.register_annotator_from_aov( | ||
| aov="SimpleShadingSD", output_data_type=np.uint8, output_channels=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should output channels be 3 directly? Later on slicing is done to that anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only for Isaac Sim 5.1?
| Returns: | ||
| tuple[torch.Tensor, torch.Tensor, torch.Tensor]: Preprocessed RGB, depth, and segmentation | ||
| tuple[torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, torch.Tensor | None, torch.Tensor | None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't do type annotations on docstrings since they are harder to maintain. Please remove instances of that everywhere.
Description
Credit to Wayne Huang + RT team for enabling these features
Type of change
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there