-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add video GPU decoder #5019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add video GPU decoder #5019
Conversation
💊 CI failures summary and remediationsAs of commit de8bfbd (more details on the Dr. CI page):
🕵️ 3 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work Prabhat!
I've left some more comments, but I think all of them can be addressed in a follow-up comment.
As we discussed in our call earlier, I've also gave a try at implementing the NV12->RGB conversion on the GPU and it avoids the 2 memcopies that we are currently doing (so a single kernel does the reading + conversion).
I'll post it in a branch soon
check_for_cuda_errors(cuCtxPushCurrent(cu_context), __LINE__, __FILE__); | ||
check_for_cuda_errors( | ||
cuvidDecodePicture(decoder, pic_params), __LINE__, __FILE__); | ||
check_for_cuda_errors(cuCtxPopCurrent(NULL), __LINE__, __FILE__); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future: I think it might be a good idea to guard the cuCtxPushCurrent
/ cuCtxPopCurrent
with a RAII-style guard. Something like what VPF does or (more complicated) like decord.
This guard could be better than what we currently have if what is in between the push / pop fails. In those cases, the pop
won't happen, which could be problematic (although in a synthetic case I tried out it didn't seem to have been a problem).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!(decode_caps.nOutputFormatMask & (1 << video_output_format))) { | ||
if (decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_NV12)) { | ||
video_output_format = cudaVideoSurfaceFormat_NV12; | ||
} else if ( | ||
decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_P016)) { | ||
video_output_format = cudaVideoSurfaceFormat_P016; | ||
} else if ( | ||
decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_YUV444)) { | ||
video_output_format = cudaVideoSurfaceFormat_YUV444; | ||
} else if ( | ||
decode_caps.nOutputFormatMask & | ||
(1 << cudaVideoSurfaceFormat_YUV444_16Bit)) { | ||
video_output_format = cudaVideoSurfaceFormat_YUV444_16Bit; | ||
} else { | ||
TORCH_CHECK(false, "No supported output format found"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can clean this up now that we will only support returning RGB. But this can be done in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
video_codec = video_format->codec; | ||
video_chroma_format = video_format->chroma_format; | ||
bit_depth_minus8 = video_format->bit_depth_luma_minus8; | ||
bytes_per_pixel = bit_depth_minus8 > 0 ? 2 : 1; | ||
// Set the output surface format same as chroma format | ||
switch (video_chroma_format) { | ||
case cudaVideoChromaFormat_Monochrome: | ||
case cudaVideoChromaFormat_420: | ||
video_output_format = video_format->bit_depth_luma_minus8 | ||
? cudaVideoSurfaceFormat_P016 | ||
: cudaVideoSurfaceFormat_NV12; | ||
break; | ||
case cudaVideoChromaFormat_444: | ||
video_output_format = video_format->bit_depth_luma_minus8 | ||
? cudaVideoSurfaceFormat_YUV444_16Bit | ||
: cudaVideoSurfaceFormat_YUV444; | ||
break; | ||
case cudaVideoChromaFormat_422: | ||
video_output_format = cudaVideoSurfaceFormat_NV12; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be cleaned up in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static auto check_for_cuda_errors = | ||
[](CUresult result, int line_num, std::string file_name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we don't usually define lambdas outside of the scope of functions in PyTorch codebase. But I believe this is just a stylistic nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto options = torch::TensorOptions().dtype(torch::kU8).device(torch::kCUDA); | ||
torch::Tensor frame = torch::zeros({0}, options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe you can just do
torch::Tensor frame;
now that decoder.fetch_frame()
returns a tensor of the right dtype and device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to check if the errors are related or not (maybe not). Also, in a follow-up PR we will need to add CI tests for those functions. This might be a bit involved, but looks like the NVIDIA-docker has nvdec available in the container |
Looks like it's only on Python 3.6, which reached its EOL last week https://endoflife.date/python So we should probably just remove support for Python 3.6 in torchvision main branch (as long as PyTorch also drops support for it) |
Summary: * [WIP] Add video GPU decoder * Expose use_dev_frame to python class and handle it internally * Fixed invalid argument CUDA error * Fixed empty and missing frames * Free remaining frames in the queue * Added nv12 to yuv420 conversion support for host frames * Added unit test and cleaned up code * Use CUDA_HOME inside if * Undo commented out code * Add Readme * Remove output_format and use_device_frame optional arguments from the VideoReader API * Cleaned up init() * Fix warnings * Fix python linter errors * Fix linter issues in setup.py * clang-format * Make reformat private * Member function naming * Add comments * Variable renaming * Code cleanup * Make return type of decode() void * Replace printing errors with throwing runtime_error * Replaced runtime_error with TORCH_CHECK in demuxer.h * Use CUDAGuard instead of cudaSetDevice * Remove printf * Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree * Use TORCH_CHECK instead of runtime_error * Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location * Include ffmpeg_include_dir * Remove space * Removed use of runtime_error * Update Readme * Check for bsf.h * Change struct initialisation style * Clean-up get_operating_point * Make variable naming convention uniform * Move checking for bsf.h around * Fix linter error Reviewed By: datumbox, prabhat00155 Differential Revision: D33405358 fbshipit-source-id: 0e6251389508309a23c7afd843f298208dcd67e8 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Differential Revision: D33405358 Original commit changeset: 0e6251389508 Original Phabricator Diff: D33405358 fbshipit-source-id: b554aaa8003aca08826540883783644aa7eebea9
Summary: * [WIP] Add video GPU decoder * Expose use_dev_frame to python class and handle it internally * Fixed invalid argument CUDA error * Fixed empty and missing frames * Free remaining frames in the queue * Added nv12 to yuv420 conversion support for host frames * Added unit test and cleaned up code * Use CUDA_HOME inside if * Undo commented out code * Add Readme * Remove output_format and use_device_frame optional arguments from the VideoReader API * Cleaned up init() * Fix warnings * Fix python linter errors * Fix linter issues in setup.py * clang-format * Make reformat private * Member function naming * Add comments * Variable renaming * Code cleanup * Make return type of decode() void * Replace printing errors with throwing runtime_error * Replaced runtime_error with TORCH_CHECK in demuxer.h * Use CUDAGuard instead of cudaSetDevice * Remove printf * Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree * Use TORCH_CHECK instead of runtime_error * Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location * Include ffmpeg_include_dir * Remove space * Removed use of runtime_error * Update Readme * Check for bsf.h * Change struct initialisation style * Clean-up get_operating_point * Make variable naming convention uniform * Move checking for bsf.h around * Fix linter error Reviewed By: NicolasHug Differential Revision: D33476941 fbshipit-source-id: e310435c966fe79ab77eaba305a03dd0af7a17a5 Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
This PR adds support for GPU decoding in torchvision’s video reading API.
Resolves #2439 and partly addresses #4392.
This is the initial version of GPU video decoder. This can be called like this:
The result after performing GPU decoding can be returned in the form of a CUDA tensor(when using use_device_frame=True) or a CPU tensor(use_device_frame=False). When
use_device_frame=True
,nv12
is the only supported output format, when usinguse_device_frame=False
,nv12
andyuv420
are the supported output formats.Work items to extend this further can be found here.