Skip to content

VideoStreamMetadata.sample_aspect_ratio: new metadata field (#733) #737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 25, 2025

Conversation

carandraug
Copy link
Contributor

New field to VideoStreamMetadata to at least get information about the stream sample/pixel aspect ratio. Getting this information is the minimum required to support non-square pixels.


This addresses issue #733 (which was deemed "reasonable" which is why I'm implementing it, thank you for considering this).

There is one thing that annoys me about this PR. The names for properties in C++ StreamMetadata and json returned by get_stream_json_metadata match. I could have separate num/den properties in StreamMetadata but I don't think that's quite right because they're actually two parts of the same value. On the json side of things I made it two separate properties because it's std::map<std::string, std::string> (although it would be trivial to make it a colon separated string and parse it on the python side of things).

)

New field to VideoStreamMetadata to at least get information about the
stream sample/pixel aspect ratio.  Getting this information is the
minimum required to support non-square pixels.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 22, 2025
@scotts
Copy link
Contributor

scotts commented Jun 23, 2025

@carandraug, thanks for tackling this! This PR actually raises an interesting design question: so far, we've only exposed metadata as strings, floats or ints. The value that's closest to the SAR that we currently have is the FPS, which we return as a float. In FFmpeg, it's an AVRational. We have not yet returned a rational as-is, without turning it into a float.

For your use-case, is it easier to have a rational, or a float? If we keep a rational, I think we should actually make it a numbers.Rational rather than just a tuple. @NicolasHug, thoughts?

@NicolasHug
Copy link
Member

I think it would have to be a fractions.Fraction because Rational is an ABC, but agreed this is better than a plain tuple as it supports natural operations like +, float(.), etc.

Now that I think about it, we could have exposed fps as a fraction as well, but it's probably too late to do so without breaking BC?

@carandraug
Copy link
Contributor Author

@scotts writes:

For your use-case, is it easier to have a rational, or a float? If we keep a rational, I think we should actually make it a numbers.Rational rather than just a tuple. @NicolasHug, thoughts?

From afar, I guess using a rational would be slightly simpler but I see no reason why I couldn't work a float though. However, it is trivial to get a float from a rational but not the inverse. If someone ever really needs the fraction, it will be too late to change so I'd suggest to return a rational now.

Shall I add a commit to this PR to use a fractions.Fraction instead of a tuple? Is the rest of the PR ok?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I add a commit to this PR to use a fractions.Fraction instead of a tuple? Is the rest of the PR ok?

Yes and yes, I left some minor comments below but overall this looks good. Thank you @carandraug !

@@ -45,6 +46,7 @@ struct StreamMetadata {
// Video-only fields derived from the AVCodecContext.
std::optional<int64_t> width;
std::optional<int64_t> height;
std::optional<std::pair<int, int>> sampleAspectRatio;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using FFmpeg types here already, so I think we should define this as AVRational instead of a tuple

@@ -80,6 +80,12 @@ class VideoStreamMetadata(StreamMetadata):
average_fps_from_header: Optional[float]
"""Averate fps of the stream, obtained from the header (float or None).
We recommend using the ``average_fps`` attribute instead."""
sample_aspect_ratio: Optional[tuple[int, int]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a public name, I think we should name it pixel_aspect_ratio, since it is only relevant for pixels. We use the term sample to denote audio data in the rest of our public APIs. It's fine to keep the name sampleAspectRatio to be close to FFmpeg in the private parts of the code (like in C++), but here, "pixel" will be more descriptive.

And as mentioned in previous discussion, let's make this a Fraction just to be on the safe side.

std::to_string((*streamMetadata.sampleAspectRatio).first);
map["sampleAspectRatioDen"] =
std::to_string((*streamMetadata.sampleAspectRatio).second);
}
Copy link
Member

@NicolasHug NicolasHug Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're doing here is OK. I don't see a much cleaner way around the json intermediate layer. @scotts any concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We're turning what is fundamentally two values into one object. This is a reasonable way to do that.

@scotts
Copy link
Contributor

scotts commented Jun 24, 2025

@NicolasHug, you bring up an interesting point about our existing rates: should we have also made them numbers.Fractions? Maybe, but something that's implied in our API is that the time unit we both report and accept is always seconds. Our rationale was that's more ergonomic for users.

If we find that folks start needing the underlying rational, we can support that in a backwards-compatible way. We can always change our implementation in C++ and the Python layer to have the actual fraction. The existing APIs would become properties that return the float, and we could add *_rational accessors. But this is a layer of complexity I'd only want us to take on if users see a big need for it.

@carandraug
Copy link
Contributor Author

I have applied all the requested changes:

  • std::pair -> AVRational
  • sample_aspect_ratio -> pixel_aspect_ratio
  • tuple[int, int] -> Fraction

There is one thing that is annoying me with the use of Fraction. For the common case of Fraction(1, 1), this prints as 1`. I think it would be nice if the fraction format was always used:

VideoStreamMetadata:
[...]
  pixel_aspect_ratio: 1  # would be nice if this was 1/1
[...]

I could force it on StreamMetadata.__repr__ maybe, not sure how you feel about it.

@carandraug
Copy link
Contributor Author

I could force it on StreamMetadata.__repr__ maybe, not sure how you feel about it.

To be clear, I mean something along the lines of:

diff --git a/src/torchcodec/_core/_metadata.py b/src/torchcodec/_core/_metadata.py
index 054cfc3..39830e9 100644
--- a/src/torchcodec/_core/_metadata.py
+++ b/src/torchcodec/_core/_metadata.py
@@ -41,7 +41,12 @@ class StreamMetadata:
     def __repr__(self):
         s = self.__class__.__name__ + ":\n"
         for field in dataclasses.fields(self):
-            s += f"{SPACES}{field.name}: {getattr(self, field.name)}\n"
+            attr = getattr(self, field.name)
+            if isinstance(attr, Optional[Fraction]):
+                # Once we require Python 3.13 we can use format(.., "#")
+                s += f"{SPACES}{field.name}: {attr.numerator}/{attr.denominator}\n"
+            else:
+                s += f"{SPACES}{field.name}: {attr}\n"
         return s

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @carandraug , LGTM with a comment below, let's see if tests are happy.

On the __repr__: I personally I don't think it's worth the hassle, but let's consider that in a separate PR so we can get this one merged first.

Comment on lines 85 to 86
"""Pixel Aspect Ratio (PAR), also known as Sample Aspect Ratio
(SAR --- not to be confused with Sample Aspect Ratio, also SAR),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

known as Sample Aspect Ratio (SAR --- not to be confused with Sample Aspect Ratio

Emphasis mine above, is there a typo? Both are the same.

But maybe we don't even need to mention Sample Aspect Ratio at all 🤔 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is a typo. It should be Storage Aspect Ratio. It is fixed now on a separate commit.

I think it's worth to, at least, mention sample aspect ratio since it's very a common term for this (e.g., that's what ffmpeg uses). If a user is familiar with sample aspect ratio only, then this hint helps them in understanding what this is about (otherwise they will have to got read on PAR before knowing it's exactly the same thing). I could drop the note about Storage Aspect Ration being also SAR but because there's so much confusion with this terms, I think it's better to be explicit that we're talking about SAR).

To be really explicit, I could add to the docstring "if this value is not 1/1, then the pixels are not square".

@NicolasHug NicolasHug merged commit d6ce570 into pytorch:main Jun 25, 2025
42 of 44 checks passed
@NicolasHug
Copy link
Member

Merging, thank you for the PR @carandraug !

@carandraug carandraug deleted the 733-sar-in-metadata branch June 25, 2025 11:11
carandraug added a commit to carandraug/torchcodec that referenced this pull request Jun 25, 2025
The new tests introduced in ffac96c (pytorch#732) are failing since d6ce570
(pytorch#737) because d6ce570 was merged by rebase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants