Fix visual observation tensor indexing for Unity inference #6239

pglagol · 2025-08-18T12:43:14Z

Summary

Fix tensor indexing in Unity inference to use correct CHW layout
Fixed erroneous behavior of the agent on the Unity side when there is more than one visual channel

Problem

The indexing formula was calculating HWC layout while Unity's visual sensors write data in CHW format.

Solution

Corrected TensorExtensions.Index() to properly calculate CHW tensor indices that match Unity's observation writer
format.

This change corrects the tensor indexing calculation in TensorExtensions.Index() to properly support CHW (channels-height-width) format used by both Unity's observation writers and ONNX models during inference.

robinDLFM

Solved it for me

maryamziaa

Hi, thank you for submitting the PR. I'll take a look after Release 4

maryamziaa · 2025-09-02T20:46:23Z

Hi, I took a look at the issue you reported. It's not a bug in ML-Agents core code, but rather a missing implementation detail in the custom visual sensor that needs to match the preprocessing expectations established during training. During training, ML-Agents converts visual observations to PNG format before sending them to the Python trainer, which automatically flips the image vertically. To compensate for this during Unity ONNX inference, ML-Agents' built-in visual sensors (like CameraSensor) use the WriteTexture() method in ObservationWriter.cs, which includes a compensatory flip to match the training data format. Your custom visual sensor likely bypasses this by directly writing pixel values without the flip compensation. The fix is to either use writer.WriteTexture(texture, grayscale) if you're working with Texture2D data, or manually implement the vertical flip in your custom Write() method by iterating from height-1 down to 0 and using (height - h - 1) for your actual data indexing. This explains why switching to Vector observations works. It bypasses the entire visual preprocessing pipeline that includes the necessary flip compensation.

You can check out the visual examples like Visual3DBall and VisualFoodCollector use Unity's built-in CameraSensor or RenderTextureSensor components, which automatically handle the image flipping correctly through the WriteTexture() method.

pglagol · 2025-09-03T08:38:06Z

Thank you for the detailed response! However, I believe there might be some confusion about the nature of the problem I encountered.

My custom visual sensor doesn't use PNG compression:

public ObservationSpec GetObservationSpec() => ObservationSpec.Visual(channels, height, width, ObservationType.Default);
public CompressionSpec GetCompressionSpec() => CompressionSpec.Default();
public byte[] GetCompressedObservation() => null;

I'm using the ObservationWriter correctly according to its signature: writer[channelIndex, y, x] = value;

Your suggestion about vertical image flipping doesn't explain why my fix worked, because I didn't flip the image - I fundamentally changed the tensor indexing order from HWC to CHW.

If it were just a vertical flip issue, the fix would involve changing the y coordinate calculation (like height - y - 1), not completely reorganizing how channels, height, and width are indexed in the tensor.

I think this is a data layout problem, not an image orientation problem.

Since my custom sensor bypasses PNG compression and uses the standard ObservationWriter interface correctly, shouldn't the tensor indexing in Unity inference match the same CHW format that Unity's visual sensors use during training?

The fact that changing HWC→CHW indexing solved the problem suggests this was indeed a tensor layout mismatch issue, not a vertical flip compensation issue.

Fix visual observation tensor indexing for Unity inference.

0155f62

This change corrects the tensor indexing calculation in TensorExtensions.Index() to properly support CHW (channels-height-width) format used by both Unity's observation writers and ONNX models during inference.

pglagol mentioned this pull request Aug 18, 2025

Visual sensors cause inference quality loss in Unity vs Python #6237

Closed

robinDLFM approved these changes Aug 21, 2025

View reviewed changes

maryamziaa self-requested a review August 22, 2025 17:11

maryamziaa reviewed Aug 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix visual observation tensor indexing for Unity inference #6239

Fix visual observation tensor indexing for Unity inference #6239

Uh oh!

pglagol commented Aug 18, 2025

Uh oh!

robinDLFM left a comment

Uh oh!

maryamziaa left a comment

Uh oh!

maryamziaa commented Sep 2, 2025

Uh oh!

pglagol commented Sep 3, 2025

Uh oh!

Uh oh!

Fix visual observation tensor indexing for Unity inference #6239

Are you sure you want to change the base?

Fix visual observation tensor indexing for Unity inference #6239

Uh oh!

Conversation

pglagol commented Aug 18, 2025

Summary

Problem

Solution

Uh oh!

robinDLFM left a comment

Choose a reason for hiding this comment

Uh oh!

maryamziaa left a comment

Choose a reason for hiding this comment

Uh oh!

maryamziaa commented Sep 2, 2025

Uh oh!

pglagol commented Sep 3, 2025

Uh oh!

Uh oh!