Skip to content

VideoStreamMetadata.sample_aspect_ratio: new metadata field (#733) #737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 25, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions src/torchcodec/_core/Metadata.h
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

#include <optional>
#include <string>
#include <utility>
#include <vector>

extern "C" {
Expand Down Expand Up @@ -45,6 +46,7 @@ struct StreamMetadata {
// Video-only fields derived from the AVCodecContext.
std::optional<int64_t> width;
std::optional<int64_t> height;
std::optional<std::pair<int, int>> sampleAspectRatio;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using FFmpeg types here already, so I think we should define this as AVRational instead of a tuple


// Audio-only fields
std::optional<int64_t> sampleRate;
Expand Down
3 changes: 3 additions & 0 deletions src/torchcodec/_core/SingleStreamDecoder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -459,6 +459,9 @@ void SingleStreamDecoder::addVideoStream(

streamMetadata.width = streamInfo.codecContext->width;
streamMetadata.height = streamInfo.codecContext->height;
streamMetadata.sampleAspectRatio = {
streamInfo.codecContext->sample_aspect_ratio.num,
streamInfo.codecContext->sample_aspect_ratio.den};
}

void SingleStreamDecoder::addAudioStream(
Expand Down
19 changes: 19 additions & 0 deletions src/torchcodec/_core/_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ class VideoStreamMetadata(StreamMetadata):
average_fps_from_header: Optional[float]
"""Averate fps of the stream, obtained from the header (float or None).
We recommend using the ``average_fps`` attribute instead."""
sample_aspect_ratio: Optional[tuple[int, int]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because this is a public name, I think we should name it pixel_aspect_ratio, since it is only relevant for pixels. We use the term sample to denote audio data in the rest of our public APIs. It's fine to keep the name sampleAspectRatio to be close to FFmpeg in the private parts of the code (like in C++), but here, "pixel" will be more descriptive.

And as mentioned in previous discussion, let's make this a Fraction just to be on the safe side.

"""Sample Aspect Ratio (SAR), also known as Pixel Aspect Ratio
(PAR), is the ratio between the width of a pixel and the height of
each pixel. This is a tuple of two ints: the first element is the
numerator, and the second element is the denominator. Not to be
confused with Storage Aspect Ratio (also SAR)."""

@property
def duration_seconds(self) -> Optional[float]:
Expand Down Expand Up @@ -211,6 +217,16 @@ def best_audio_stream(self) -> AudioStreamMetadata:
return metadata


def _get_optional_sar_tuple(stream_dict):
try:
return (
stream_dict["sampleAspectRatioNum"],
stream_dict["sampleAspectRatioDen"],
)
except KeyError:
return None


# TODO-AUDIO: This is user-facing. Should this just be `get_metadata`, without
# the "container" name in it? Same below.
def get_container_metadata(decoder: torch.Tensor) -> ContainerMetadata:
Expand Down Expand Up @@ -247,6 +263,9 @@ def get_container_metadata(decoder: torch.Tensor) -> ContainerMetadata:
num_frames_from_header=stream_dict.get("numFramesFromHeader"),
num_frames_from_content=stream_dict.get("numFramesFromContent"),
average_fps_from_header=stream_dict.get("averageFpsFromHeader"),
# sample_aspect_ratio is a tuple. Return None,
# and not (None, None), if missing.
sample_aspect_ratio=_get_optional_sar_tuple(stream_dict),
**common_meta,
)
)
Expand Down
6 changes: 6 additions & 0 deletions src/torchcodec/_core/custom_ops.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -601,6 +601,12 @@ std::string get_stream_json_metadata(
if (streamMetadata.height.has_value()) {
map["height"] = std::to_string(*streamMetadata.height);
}
if (streamMetadata.sampleAspectRatio.has_value()) {
map["sampleAspectRatioNum"] =
std::to_string((*streamMetadata.sampleAspectRatio).first);
map["sampleAspectRatioDen"] =
std::to_string((*streamMetadata.sampleAspectRatio).second);
}
Copy link
Member

@NicolasHug NicolasHug Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you're doing here is OK. I don't see a much cleaner way around the json intermediate layer. @scotts any concern?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. We're turning what is fundamentally two values into one object. This is a reasonable way to do that.

if (streamMetadata.averageFpsFromHeader.has_value()) {
map["averageFpsFromHeader"] =
std::to_string(*streamMetadata.averageFpsFromHeader);
Expand Down
3 changes: 3 additions & 0 deletions test/test_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ def test_get_metadata(metadata_getter):
assert best_video_stream_metadata.begin_stream_seconds_from_header == 0
assert best_video_stream_metadata.bit_rate == 128783
assert best_video_stream_metadata.average_fps == pytest.approx(29.97, abs=0.001)
assert best_video_stream_metadata.sample_aspect_ratio is None
assert best_video_stream_metadata.codec == "h264"
assert best_video_stream_metadata.num_frames_from_content == (
390 if with_scan else None
Expand Down Expand Up @@ -137,6 +138,7 @@ def test_num_frames_fallback(
width=123,
height=321,
average_fps_from_header=30,
sample_aspect_ratio=(1, 1),
stream_index=0,
)

Expand All @@ -161,6 +163,7 @@ def test_repr():
num_frames_from_header: 390
num_frames_from_content: 390
average_fps_from_header: 29.97003
sample_aspect_ratio: (1, 1)
duration_seconds: 13.013
begin_stream_seconds: 0.0
end_stream_seconds: 13.013
Expand Down
Loading