Refactor audio sample conversion in encoder #704

NicolasHug · 2025-05-29T09:05:58Z

At a high level this PR just moves our AVFrame conversion logic away from encodeInnerLoop() into a new maybeConvertAVFrame() method, which is now called in encode().

It makes more sense to call this conversion logic before entering encodeInnerLoop(), and this becomes more evident as I'm working on sample rate conversion in another branch.

This is mainly copy/pasted but the diff isn't trivial and has a few subtle changes that I'll describe in the comments below.

NicolasHug · 2025-05-29T09:07:21Z

src/torchcodec/_core/Encoder.cpp

    numEncodedSamples += numSamplesToEncode;
+    // TODO-ENCODING set frame pts correctly, and test against it.
+    // avFrame->pts += static_cast<int64_t>(numSamplesToEncode);


This is a drive-by. I realized that commenting this out was letting our tests pass, so this isn't load bearing. We weren't setting the pts on the convertedAVFrame anyway, so our logic was already wrong. I'm adding this TODO to investigate that later.

Let's also create an issue.

NicolasHug · 2025-05-29T09:11:22Z

src/torchcodec/_core/Encoder.cpp

+      getNumChannels(avFrame) == outNumChannels_) {
+    // Note: the clone references the same underlying data, it's a cheap copy.
+    return UniqueAVFrame(av_frame_clone(avFrame.get()));
+  }


The main logic change is this one above: the call to av_frame_clone when no conversion needs to be done. We want to return the original avFrame but we can't use std::move, because that would destruct the original avFrame prematurely (the one created in encode()).

Huh, I would have thought that using std::move() would keep the actual AVFrame alive that is contained in UniqueAVFrame. Yes, the UniqueAVFrame created in the calling scope would go away, but the underlying object it wrapped would live on. Does that not happen?

In order for that to work, though, we'd have to change how we pass the frame into maybeConvertAVFrame(). I think the way with the clearest intent is that it accepts just a plain UniqueAVFrame (not a reference), and then we std::move() the one created in the calling scope into it. (We also could pass by non-const reference, but I think that doesn't make the ownership semantics clear.)

Thanks for the review! As discussed offline I'll merge now but I want to dig a little bit more about what we could do differently

…sion-out

Refactor audio sample conversion in encoder

6c91450

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 29, 2025

NicolasHug commented May 29, 2025

View reviewed changes

scotts approved these changes May 29, 2025

View reviewed changes

Merge branch 'main' of github.com:pytorch/torchcodec into move-conver…

8042e43

…sion-out

NicolasHug merged commit 3056f40 into pytorch:main May 29, 2025
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor audio sample conversion in encoder #704

Refactor audio sample conversion in encoder #704

Uh oh!

NicolasHug commented May 29, 2025

Uh oh!

NicolasHug May 29, 2025

Uh oh!

scotts May 29, 2025

Uh oh!

NicolasHug May 29, 2025

Uh oh!

scotts May 29, 2025 •

edited

Loading

Uh oh!

NicolasHug May 29, 2025

Uh oh!

Uh oh!

Uh oh!

Refactor audio sample conversion in encoder #704

Refactor audio sample conversion in encoder #704

Uh oh!

Conversation

NicolasHug commented May 29, 2025

Uh oh!

NicolasHug May 29, 2025

Choose a reason for hiding this comment

Uh oh!

scotts May 29, 2025

Choose a reason for hiding this comment

Uh oh!

NicolasHug May 29, 2025

Choose a reason for hiding this comment

Uh oh!

scotts May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

scotts May 29, 2025 •

edited

Loading