-
Notifications
You must be signed in to change notification settings - Fork 55
Fix Windows build and add Windows CPU tests #806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
avio_close(avFormatContext_->pb); | ||
if (avFormatContext_->pb->error == 0) { | ||
avio_close(avFormatContext_->pb); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is a drive-by as I was trying to fix the problem with FFmpeg 5. I'm not sure it fixed anything, but it is probably still a good change to have? I can extract it out in another PR if it's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this code imply that if the AVIOContext
stored in avFormatContext_
has an error, we do not need to flush / close it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on my digging, it's also that it's risky to call flush when there's an error state. Seems reasonable to keep it in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it because when error isn't 0 then flushing or calling avio_close()
are potentially accessing invalid data.
with pytest.raises( | ||
RuntimeError, | ||
match=avcodec_open2_failed_msg if IS_WINDOWS else "invalid sample rate=10", | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
above and in other tests below: on Windows, we're hitting this early return:
torchcodec/src/torchcodec/_core/Encoder.cpp
Lines 35 to 38 in 05a6ff5
void validateSampleRate(const AVCodec& avCodec, int sampleRate) { | |
if (avCodec.supported_samplerates == nullptr) { | |
return; | |
} |
So we're unable to validate some parameters early, and we just fail in the call to avcodec_open2()
later:
torchcodec/src/torchcodec/_core/Encoder.cpp
Lines 213 to 217 in 05a6ff5
int status = avcodec_open2(avCodecContext_.get(), avCodec, nullptr); | |
TORCH_CHECK( | |
status == AVSUCCESS, | |
"avcodec_open2 failed: ", | |
getFFMPEGErrorStringFromErrorCode(status)); |
ARCHIVE_OUTPUT_DIRECTORY_RELEASE ${CMAKE_CURRENT_BINARY_DIR} | ||
) | ||
endif() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@traversaro correctly pointed out that this can now be removed, as a consequence of the other cmake_build_type
fix below #806 (comment)
subprocess.check_call( | ||
["cmake", "--install", ".", "--config", cmake_build_type], | ||
cwd=self.build_temp, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above is the main fix that was needed for the extensions to load properly, something I had been blocked on for a few weeks. The fix is from @traversaro :
IIUC, on windows and with our current setup, cmake is generating multiple build files. And without this fix, the different build files would be inheriting different build configuration. Typically we'd be building some parts with the Release
build type, while other would be inheriting a different build type, causing problems at runtime when loading the libraries.
@traversaro thank you so much again for unblocking us!
.github/workflows/windows_wheel.yaml
Outdated
strategy: | ||
fail-fast: false | ||
matrix: | ||
python-version: ['3.9', '3.10', '3.11', '3.12'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will revert to just 3.9 before merging, like in the other jobs. This is just to show that the jobs are green across multiple versions
Also, you probably don't need to review the rest of this file too closely, because it's mostly copy/pasted from our existing test job (for e.g. Linux)
.github/workflows/windows_wheel.yaml
Outdated
@@ -36,6 +36,9 @@ jobs: | |||
with-rocm: disable | |||
with-cuda: disable | |||
build-python-only: "disable" | |||
# Explicitly avoid 3.13 because 3.13t builds don't work. | |||
# TODO remove eventually. | |||
python-versions: '["3.9", "3.10", "3.11", "3.12"]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove this before merging. This is just to show that the jobs are green across versions.
Note that we may still try to address the 3.13t issue, but that's for a follow-up.
@@ -295,8 +315,15 @@ def test_against_cli( | |||
rtol, atol = 0, 1e-3 | |||
else: | |||
rtol, atol = None, None | |||
|
|||
if IS_WINDOWS and format == "mp3": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this condition be hit? It might be duplicate with the check and pytest.skip
on line 260
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check above only skips on FFmpeg <= 5, but FFmpeg 6 and 7 will still reach this line :)
torch.testing.assert_close( | ||
self.decode(encoded_file).data, self.decode(encoded_output).data | ||
) | ||
if not (IS_WINDOWS and format == "mp3"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here, is this a duplicate check with line 319? On a similar note, it might be better to explicitly call pytest.skip
whenever we are not running a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a duplicate because this will be reached by FFmpeg 6 and 7. You're right that we should generally prefer using pytest.skip
to just return
, or to doing what we're doing here: it's better to know that a test was skipped, rather than seeing it be green when in fact it never ran.
In this specific case I think it's best not to call pytest.skip
, because the test is still doing something meaningful just above. Specifically it calls the encoding methods, without error. We are only skipping the assert_close()
check.
@@ -414,7 +447,7 @@ def test_num_channels( | |||
|
|||
sample_rate = 16_000 | |||
source_samples = torch.rand(num_channels_input, 1_000) | |||
format = "mp3" | |||
format = "flac" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this change enable the test to pass on Windows? If so, we could parameterize the format and add an IS_WINDOWS
check to skip if IS_WINDOWS and format == "mp3"
, as in tests above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, I changed from mp3 to flac due to the issues we had on Windows.
I agree that we could parameterize, but I'm not sure it's necessary for the purpose of this test, as it wouldn't make it more robust. Mostly this test is just about checking the number of output channels is respected, which is agnostic to the format.
This PR fixes the Windows wheel build, and adds tests jobs that mimic the setup we have for other platforms: we build a wheel, and then install and run the tests from that wheel on the separate machine.
The vast majority of tests are passing as-is, without any modification needed. Worth nothing that for comparing frames, we are going through the same path as on OSX, with higher tolerances:
torchcodec/test/utils.py
Lines 82 to 83 in 05a6ff5
I had a few problems with the audio encoder though. in increasing order of importance:
avcodec_open2()
. This is described in this comment: Fix Windows build and add Windows CPU tests #806 (comment)avcodec_open2
when passing bad parameters #836 to follow-up. You'll see more details there but the segfault happens internally withinavcodec_open2()
, even after validating that the parameters are valid pointers. Because it only happens on Windows and on FFmpeg5, I want to declare that this is an ffmpeg5 bug.Note: we're not errorring on warnings on Windows, and there are a lot of those. I'll re-enable this as a follow-up:
torchcodec/src/torchcodec/_core/CMakeLists.txt
Lines 14 to 17 in 05a6ff5
Massive thanks to @traversaro for unblocking us with the #806 (comment) fix.