Skip to content

Refactor audio sample conversion in encoder #704

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 39 additions & 35 deletions src/torchcodec/_core/Encoder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -288,10 +288,13 @@ void AudioEncoder::encode() {
// encoded frame would contain more samples than necessary and our results
// wouldn't match the ffmpeg CLI.
avFrame->nb_samples = numSamplesToEncode;
encodeInnerLoop(autoAVPacket, avFrame);

avFrame->pts += static_cast<int64_t>(numSamplesToEncode);
UniqueAVFrame convertedAVFrame = maybeConvertAVFrame(avFrame);
encodeInnerLoop(autoAVPacket, convertedAVFrame);

numEncodedSamples += numSamplesToEncode;
// TODO-ENCODING set frame pts correctly, and test against it.
// avFrame->pts += static_cast<int64_t>(numSamplesToEncode);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a drive-by. I realized that commenting this out was letting our tests pass, so this isn't load bearing. We weren't setting the pts on the convertedAVFrame anyway, so our logic was already wrong. I'm adding this TODO to investigate that later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also create an issue.

}
TORCH_CHECK(numEncodedSamples == numSamples, "Hmmmmmm something went wrong.");

Expand All @@ -304,42 +307,43 @@ void AudioEncoder::encode() {
getFFMPEGErrorStringFromErrorCode(status));
}

void AudioEncoder::encodeInnerLoop(
AutoAVPacket& autoAVPacket,
const UniqueAVFrame& srcAVFrame) {
bool mustConvert =
(srcAVFrame != nullptr &&
(avCodecContext_->sample_fmt != AV_SAMPLE_FMT_FLTP ||
getNumChannels(srcAVFrame) != outNumChannels_));

UniqueAVFrame convertedAVFrame;
if (mustConvert) {
if (!swrContext_) {
swrContext_.reset(createSwrContext(
AV_SAMPLE_FMT_FLTP,
avCodecContext_->sample_fmt,
srcAVFrame->sample_rate, // No sample rate conversion
srcAVFrame->sample_rate,
srcAVFrame,
outNumChannels_));
}
convertedAVFrame = convertAudioAVFrameSamples(
swrContext_,
srcAVFrame,
UniqueAVFrame AudioEncoder::maybeConvertAVFrame(const UniqueAVFrame& avFrame) {
if (static_cast<AVSampleFormat>(avFrame->format) ==
avCodecContext_->sample_fmt &&
getNumChannels(avFrame) == outNumChannels_) {
// Note: the clone references the same underlying data, it's a cheap copy.
return UniqueAVFrame(av_frame_clone(avFrame.get()));
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main logic change is this one above: the call to av_frame_clone when no conversion needs to be done. We want to return the original avFrame but we can't use std::move, because that would destruct the original avFrame prematurely (the one created in encode()).

Copy link
Contributor

@scotts scotts May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I would have thought that using std::move() would keep the actual AVFrame alive that is contained in UniqueAVFrame. Yes, the UniqueAVFrame created in the calling scope would go away, but the underlying object it wrapped would live on. Does that not happen?

In order for that to work, though, we'd have to change how we pass the frame into maybeConvertAVFrame(). I think the way with the clearest intent is that it accepts just a plain UniqueAVFrame (not a reference), and then we std::move() the one created in the calling scope into it. (We also could pass by non-const reference, but I think that doesn't make the ownership semantics clear.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! As discussed offline I'll merge now but I want to dig a little bit more about what we could do differently


if (!swrContext_) {
swrContext_.reset(createSwrContext(
static_cast<AVSampleFormat>(avFrame->format),
avCodecContext_->sample_fmt,
srcAVFrame->sample_rate, // No sample rate conversion
outNumChannels_);
TORCH_CHECK(
convertedAVFrame->nb_samples == srcAVFrame->nb_samples,
"convertedAVFrame->nb_samples=",
convertedAVFrame->nb_samples,
" differs from ",
"srcAVFrame->nb_samples=",
srcAVFrame->nb_samples,
"This is unexpected, please report on the TorchCodec bug tracker.");
avFrame->sample_rate, // No sample rate conversion
avFrame->sample_rate,
avFrame,
outNumChannels_));
}
const UniqueAVFrame& avFrame = mustConvert ? convertedAVFrame : srcAVFrame;
UniqueAVFrame convertedAVFrame = convertAudioAVFrameSamples(
swrContext_,
avFrame,
avCodecContext_->sample_fmt,
avFrame->sample_rate, // No sample rate conversion
outNumChannels_);
TORCH_CHECK(
convertedAVFrame->nb_samples == avFrame->nb_samples,
"convertedAVFrame->nb_samples=",
convertedAVFrame->nb_samples,
" differs from ",
"avFrame->nb_samples=",
avFrame->nb_samples,
"This is unexpected, please report on the TorchCodec bug tracker.");
return convertedAVFrame;
}

void AudioEncoder::encodeInnerLoop(
AutoAVPacket& autoAVPacket,
const UniqueAVFrame& avFrame) {
auto status = avcodec_send_frame(avCodecContext_.get(), avFrame.get());
TORCH_CHECK(
status == AVSUCCESS,
Expand Down
1 change: 1 addition & 0 deletions src/torchcodec/_core/Encoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ class AudioEncoder {
void initializeEncoder(
int sampleRate,
const AudioStreamOptions& audioStreamOptions);
UniqueAVFrame maybeConvertAVFrame(const UniqueAVFrame& avFrame);
void encodeInnerLoop(
AutoAVPacket& autoAVPacket,
const UniqueAVFrame& srcAVFrame);
Expand Down
Loading