Skip to content

Conversation

pglagol
Copy link

@pglagol pglagol commented Aug 18, 2025

Summary

  • Fix tensor indexing in Unity inference to use correct CHW layout
  • Fixed erroneous behavior of the agent on the Unity side when there is more than one visual channel

Problem

The indexing formula was calculating HWC layout while Unity's visual sensors write data in CHW format.

Solution

Corrected TensorExtensions.Index() to properly calculate CHW tensor indices that match Unity's observation writer
format.

This change corrects the tensor indexing calculation in TensorExtensions.Index()
to properly support CHW (channels-height-width) format used by both Unity's
observation writers and ONNX models during inference.
Copy link

@robinDLFM robinDLFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved it for me

@maryamziaa maryamziaa self-requested a review August 22, 2025 17:11
Copy link
Contributor

@maryamziaa maryamziaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for submitting the PR. I'll take a look after Release 4

@maryamziaa
Copy link
Contributor

Hi, I took a look at the issue you reported. It's not a bug in ML-Agents core code, but rather a missing implementation detail in the custom visual sensor that needs to match the preprocessing expectations established during training. During training, ML-Agents converts visual observations to PNG format before sending them to the Python trainer, which automatically flips the image vertically. To compensate for this during Unity ONNX inference, ML-Agents' built-in visual sensors (like CameraSensor) use the WriteTexture() method in ObservationWriter.cs, which includes a compensatory flip to match the training data format. Your custom visual sensor likely bypasses this by directly writing pixel values without the flip compensation. The fix is to either use writer.WriteTexture(texture, grayscale) if you're working with Texture2D data, or manually implement the vertical flip in your custom Write() method by iterating from height-1 down to 0 and using (height - h - 1) for your actual data indexing. This explains why switching to Vector observations works. It bypasses the entire visual preprocessing pipeline that includes the necessary flip compensation.

You can check out the visual examples like Visual3DBall and VisualFoodCollector use Unity's built-in CameraSensor or RenderTextureSensor components, which automatically handle the image flipping correctly through the WriteTexture() method.

@pglagol
Copy link
Author

pglagol commented Sep 3, 2025

Thank you for the detailed response! However, I believe there might be some confusion about the nature of the problem I encountered.

My custom visual sensor doesn't use PNG compression:

public ObservationSpec GetObservationSpec() => ObservationSpec.Visual(channels, height, width, ObservationType.Default);
public CompressionSpec GetCompressionSpec() => CompressionSpec.Default();
public byte[] GetCompressedObservation() => null;

I'm using the ObservationWriter correctly according to its signature: writer[channelIndex, y, x] = value;

Your suggestion about vertical image flipping doesn't explain why my fix worked, because I didn't flip the image - I fundamentally changed the tensor indexing order from HWC to CHW.

If it were just a vertical flip issue, the fix would involve changing the y coordinate calculation (like height - y - 1), not completely reorganizing how channels, height, and width are indexed in the tensor.

I think this is a data layout problem, not an image orientation problem.

Since my custom sensor bypasses PNG compression and uses the standard ObservationWriter interface correctly, shouldn't the tensor indexing in Unity inference match the same CHW format that Unity's visual sensors use during training?

The fact that changing HWC→CHW indexing solved the problem suggests this was indeed a tensor layout mismatch issue, not a vertical flip compensation issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants