-
Notifications
You must be signed in to change notification settings - Fork 46
VideoStreamMetadata.sample_aspect_ratio: new metadata field (#733) #737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@carandraug, thanks for tackling this! This PR actually raises an interesting design question: so far, we've only exposed metadata as strings, floats or ints. The value that's closest to the SAR that we currently have is the FPS, which we return as a float. In FFmpeg, it's an For your use-case, is it easier to have a rational, or a float? If we keep a rational, I think we should actually make it a |
I think it would have to be a Now that I think about it, we could have exposed fps as a fraction as well, but it's probably too late to do so without breaking BC? |
@scotts writes:
From afar, I guess using a rational would be slightly simpler but I see no reason why I couldn't work a float though. However, it is trivial to get a float from a rational but not the inverse. If someone ever really needs the fraction, it will be too late to change so I'd suggest to return a rational now. Shall I add a commit to this PR to use a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall I add a commit to this PR to use a fractions.Fraction instead of a tuple? Is the rest of the PR ok?
Yes and yes, I left some minor comments below but overall this looks good. Thank you @carandraug !
src/torchcodec/_core/Metadata.h
Outdated
@@ -45,6 +46,7 @@ struct StreamMetadata { | |||
// Video-only fields derived from the AVCodecContext. | |||
std::optional<int64_t> width; | |||
std::optional<int64_t> height; | |||
std::optional<std::pair<int, int>> sampleAspectRatio; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're using FFmpeg types here already, so I think we should define this as AVRational instead of a tuple
src/torchcodec/_core/_metadata.py
Outdated
@@ -80,6 +80,12 @@ class VideoStreamMetadata(StreamMetadata): | |||
average_fps_from_header: Optional[float] | |||
"""Averate fps of the stream, obtained from the header (float or None). | |||
We recommend using the ``average_fps`` attribute instead.""" | |||
sample_aspect_ratio: Optional[tuple[int, int]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this is a public name, I think we should name it pixel_aspect_ratio
, since it is only relevant for pixels. We use the term sample
to denote audio data in the rest of our public APIs. It's fine to keep the name sampleAspectRatio
to be close to FFmpeg in the private parts of the code (like in C++), but here, "pixel" will be more descriptive.
And as mentioned in previous discussion, let's make this a Fraction just to be on the safe side.
std::to_string((*streamMetadata.sampleAspectRatio).first); | ||
map["sampleAspectRatioDen"] = | ||
std::to_string((*streamMetadata.sampleAspectRatio).second); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're doing here is OK. I don't see a much cleaner way around the json intermediate layer. @scotts any concern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. We're turning what is fundamentally two values into one object. This is a reasonable way to do that.
@NicolasHug, you bring up an interesting point about our existing rates: should we have also made them If we find that folks start needing the underlying rational, we can support that in a backwards-compatible way. We can always change our implementation in C++ and the Python layer to have the actual fraction. The existing APIs would become properties that return the float, and we could add |
I have applied all the requested changes:
There is one thing that is annoying me with the use of
I could force it on |
To be clear, I mean something along the lines of:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @carandraug , LGTM with a comment below, let's see if tests are happy.
On the __repr__
: I personally I don't think it's worth the hassle, but let's consider that in a separate PR so we can get this one merged first.
src/torchcodec/_core/_metadata.py
Outdated
"""Pixel Aspect Ratio (PAR), also known as Sample Aspect Ratio | ||
(SAR --- not to be confused with Sample Aspect Ratio, also SAR), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
known as Sample Aspect Ratio (SAR --- not to be confused with Sample Aspect Ratio
Emphasis mine above, is there a typo? Both are the same.
But maybe we don't even need to mention Sample Aspect Ratio at all 🤔 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is a typo. It should be Storage Aspect Ratio. It is fixed now on a separate commit.
I think it's worth to, at least, mention sample aspect ratio since it's very a common term for this (e.g., that's what ffmpeg uses). If a user is familiar with sample aspect ratio only, then this hint helps them in understanding what this is about (otherwise they will have to got read on PAR before knowing it's exactly the same thing). I could drop the note about Storage Aspect Ration being also SAR but because there's so much confusion with this terms, I think it's better to be explicit that we're talking about SAR).
To be really explicit, I could add to the docstring "if this value is not 1/1, then the pixels are not square".
Merging, thank you for the PR @carandraug ! |
The new tests introduced in ffac96c (pytorch#732) are failing since d6ce570 (pytorch#737) because d6ce570 was merged by rebase.
New field to
VideoStreamMetadata
to at least get information about the stream sample/pixel aspect ratio. Getting this information is the minimum required to support non-square pixels.This addresses issue #733 (which was deemed "reasonable" which is why I'm implementing it, thank you for considering this).
There is one thing that annoys me about this PR. The names for properties in C++
StreamMetadata
and json returned byget_stream_json_metadata
match. I could have separate num/den properties inStreamMetadata
but I don't think that's quite right because they're actually two parts of the same value. On the json side of things I made it two separate properties because it'sstd::map<std::string, std::string>
(although it would be trivial to make it a colon separated string and parse it on the python side of things).