Skip to content

Add video GPU decoder #5019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Dec 30, 2021
Merged

Conversation

prabhat00155
Copy link
Contributor

@prabhat00155 prabhat00155 commented Dec 1, 2021

This PR adds support for GPU decoding in torchvision’s video reading API.
Resolves #2439 and partly addresses #4392.

This is the initial version of GPU video decoder. This can be called like this:

reader = torchvision.io.VideoReader(file_name, device="cuda:0")
for frame in reader:
  print(frames['data'])

The result after performing GPU decoding can be returned in the form of a CUDA tensor(when using use_device_frame=True) or a CPU tensor(use_device_frame=False). When use_device_frame=True, nv12 is the only supported output format, when using use_device_frame=False, nv12 and yuv420 are the supported output formats.

Work items to extend this further can be found here.

@facebook-github-bot
Copy link

facebook-github-bot commented Dec 1, 2021

💊 CI failures summary and remediations

As of commit de8bfbd (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build unittest_windows_cpu_py3.6 (1/3)

Step: "Setup" (full log | diagnosis details | 🔁 rerun)

Running setup.py install for av: finished with status 'error'
  Building wheel for av (setup.py): started

  Building wheel for av (setup.py): finished with status 'error'

  Running setup.py clean for av

Failed to build av

Installing collected packages: av

  Attempting uninstall: av

    Found existing installation: av 8.0.3

    Uninstalling av-8.0.3:

      Successfully uninstalled av-8.0.3

    Running setup.py install for av: started

    Running setup.py install for av: finished with status 'error'

  Rolling back uninstall of av

  Moving to c:\users\circleci\project\env\lib\site-packages\av-8.0.3.dist-info\

   from c:\users\circleci\project\env\lib\site-packages\~v-8.0.3.dist-info

  Moving to c:\users\circleci\project\env\lib\site-packages\av\

   from c:\users\circleci\project\env\lib\site-packages\~v

  Moving to c:\users\circleci\project\env\scripts\pyav.exe

   from C:\Users\circleci\AppData\Local\Temp\pip-uninstall-ma_nb1lw\pyav.exe


failed

See CircleCI build unittest_macos_cpu_py3.6 (2/3)

Step: "Setup" (full log | diagnosis details | 🔁 rerun)

ERROR: Command errored out with exit status 1: ...hon3.6m/av Check the logs for full command output.
    creating build/lib.macosx-10.9-x86_64-3.6/av/filter
    copying av/filter/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/av/filter
    creating build/lib.macosx-10.9-x86_64-3.6/av/sidedata
    copying av/sidedata/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/av/sidedata
    creating build/lib.macosx-10.9-x86_64-3.6/av/data
    copying av/data/__init__.py -> build/lib.macosx-10.9-x86_64-3.6/av/data
    running build_ext
    running config
    pkg-config is required for building PyAV
    ----------------------------------------
ERROR: Command errored out with exit status 1: /Users/distiller/project/env/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/6y/gy9gggt14379c_k39vwb50lc0000gn/T/pip-install-o8_kha3a/av_03a6e808ab7c4a1d8664aed9875b94f1/setup.py'"'"'; __file__='"'"'/private/var/folders/6y/gy9gggt14379c_k39vwb50lc0000gn/T/pip-install-o8_kha3a/av_03a6e808ab7c4a1d8664aed9875b94f1/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/6y/gy9gggt14379c_k39vwb50lc0000gn/T/pip-record-hxmbmkoc/install-record.txt --single-version-externally-managed --compile --install-headers /Users/distiller/project/env/include/python3.6m/av Check the logs for full command output.

��failed

CondaEnvException: Pip failed



Exited with code exit status 1

See CircleCI build unittest_linux_cpu_py3.6 (3/3)

Step: "Setup" (full log | diagnosis details | 🔁 rerun)

ERROR: Command errored out with exit status 1: ...hon3.6m/av Check the logs for full command output.
    	PYAV_VERSION=8.1.0
    	PYAV_VERSION_STR="8.1.0"
    Could not find libavformat with pkg-config.
    Could not find libavcodec with pkg-config.
    Could not find libavdevice with pkg-config.
    Could not find libavutil with pkg-config.
    Could not find libavfilter with pkg-config.
    Could not find libswscale with pkg-config.
    Could not find libswresample with pkg-config.
    ----------------------------------------
ERROR: Command errored out with exit status 1: /root/project/env/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-fwwie285/av_39e34a005f8d40268728b8b278f5770e/setup.py'"'"'; __file__='"'"'/tmp/pip-install-fwwie285/av_39e34a005f8d40268728b8b278f5770e/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-bk_3yw3b/install-record.txt --single-version-externally-managed --compile --install-headers /root/project/env/include/python3.6m/av Check the logs for full command output.

��failed

CondaEnvException: Pip failed



Exited with code exit status 1


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@prabhat00155 prabhat00155 marked this pull request as ready for review December 15, 2021 18:52
@prabhat00155 prabhat00155 requested a review from bjuncek December 15, 2021 19:15
@prabhat00155 prabhat00155 changed the title [WIP] Add video GPU decoder Add video GPU decoder Dec 15, 2021
@prabhat00155 prabhat00155 requested a review from fmassa December 26, 2021 16:40
Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your work Prabhat!

I've left some more comments, but I think all of them can be addressed in a follow-up comment.

As we discussed in our call earlier, I've also gave a try at implementing the NV12->RGB conversion on the GPU and it avoids the 2 memcopies that we are currently doing (so a single kernel does the reading + conversion).

I'll post it in a branch soon

Comment on lines +99 to +102
check_for_cuda_errors(cuCtxPushCurrent(cu_context), __LINE__, __FILE__);
check_for_cuda_errors(
cuvidDecodePicture(decoder, pic_params), __LINE__, __FILE__);
check_for_cuda_errors(cuCtxPopCurrent(NULL), __LINE__, __FILE__);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the future: I think it might be a good idea to guard the cuCtxPushCurrent / cuCtxPopCurrent with a RAII-style guard. Something like what VPF does or (more complicated) like decord.

This guard could be better than what we currently have if what is in between the push / pop fails. In those cases, the pop won't happen, which could be problematic (although in a synthetic case I tried out it didn't seem to have been a problem).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +223 to +237
if (!(decode_caps.nOutputFormatMask & (1 << video_output_format))) {
if (decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_NV12)) {
video_output_format = cudaVideoSurfaceFormat_NV12;
} else if (
decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_P016)) {
video_output_format = cudaVideoSurfaceFormat_P016;
} else if (
decode_caps.nOutputFormatMask & (1 << cudaVideoSurfaceFormat_YUV444)) {
video_output_format = cudaVideoSurfaceFormat_YUV444;
} else if (
decode_caps.nOutputFormatMask &
(1 << cudaVideoSurfaceFormat_YUV444_16Bit)) {
video_output_format = cudaVideoSurfaceFormat_YUV444_16Bit;
} else {
TORCH_CHECK(false, "No supported output format found");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can clean this up now that we will only support returning RGB. But this can be done in a follow-up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +248 to +266
video_codec = video_format->codec;
video_chroma_format = video_format->chroma_format;
bit_depth_minus8 = video_format->bit_depth_luma_minus8;
bytes_per_pixel = bit_depth_minus8 > 0 ? 2 : 1;
// Set the output surface format same as chroma format
switch (video_chroma_format) {
case cudaVideoChromaFormat_Monochrome:
case cudaVideoChromaFormat_420:
video_output_format = video_format->bit_depth_luma_minus8
? cudaVideoSurfaceFormat_P016
: cudaVideoSurfaceFormat_NV12;
break;
case cudaVideoChromaFormat_444:
video_output_format = video_format->bit_depth_luma_minus8
? cudaVideoSurfaceFormat_YUV444_16Bit
: cudaVideoSurfaceFormat_YUV444;
break;
case cudaVideoChromaFormat_422:
video_output_format = cudaVideoSurfaceFormat_NV12;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can be cleaned up in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +9 to +10
static auto check_for_cuda_errors =
[](CUresult result, int line_num, std::string file_name) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we don't usually define lambdas outside of the scope of functions in PyTorch codebase. But I believe this is just a stylistic nit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +31 to +32
auto options = torch::TensorOptions().dtype(torch::kU8).device(torch::kCUDA);
torch::Tensor frame = torch::zeros({0}, options);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I believe you can just do

torch::Tensor frame;

now that decoder.fetch_frame() returns a tensor of the right dtype and device

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmassa
Copy link
Member

fmassa commented Dec 30, 2021

We need to check if the errors are related or not (maybe not).

Also, in a follow-up PR we will need to add CI tests for those functions. This might be a bit involved, but looks like the NVIDIA-docker has nvdec available in the container

@fmassa
Copy link
Member

fmassa commented Dec 30, 2021

Looks like it's only on Python 3.6, which reached its EOL last week https://endoflife.date/python

So we should probably just remove support for Python 3.6 in torchvision main branch (as long as PyTorch also drops support for it)

@fmassa fmassa merged commit 64d21d1 into pytorch:main Dec 30, 2021
@prabhat00155 prabhat00155 deleted the prabhat00155/gpu_decoder branch December 30, 2021 19:20
facebook-github-bot pushed a commit that referenced this pull request Jan 5, 2022
Summary:
* [WIP] Add video GPU decoder

* Expose use_dev_frame to python class and handle it internally

* Fixed invalid argument CUDA error

* Fixed empty and missing frames

* Free remaining frames in the queue

* Added nv12 to yuv420 conversion support for host frames

* Added unit test and cleaned up code

* Use CUDA_HOME inside if

* Undo commented out code

* Add Readme

* Remove output_format and use_device_frame optional arguments from the VideoReader API

* Cleaned up init()

* Fix warnings

* Fix python linter errors

* Fix linter issues in setup.py

* clang-format

* Make reformat private

* Member function naming

* Add comments

* Variable renaming

* Code cleanup

* Make return type of decode() void

* Replace printing errors with throwing runtime_error

* Replaced runtime_error with TORCH_CHECK in demuxer.h

* Use CUDAGuard instead of cudaSetDevice

* Remove printf

* Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree

* Use TORCH_CHECK instead of runtime_error

* Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location

* Include ffmpeg_include_dir

* Remove space

* Removed use of runtime_error

* Update Readme

* Check for bsf.h

* Change struct initialisation style

* Clean-up get_operating_point

* Make variable naming convention uniform

* Move checking for bsf.h around

* Fix linter error

Reviewed By: datumbox, prabhat00155

Differential Revision: D33405358

fbshipit-source-id: 0e6251389508309a23c7afd843f298208dcd67e8

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
facebook-github-bot pushed a commit that referenced this pull request Jan 6, 2022
Differential Revision:
D33405358

Original commit changeset: 0e6251389508

Original Phabricator Diff: D33405358

fbshipit-source-id: b554aaa8003aca08826540883783644aa7eebea9
facebook-github-bot pushed a commit that referenced this pull request Jan 7, 2022
Summary:
* [WIP] Add video GPU decoder

* Expose use_dev_frame to python class and handle it internally

* Fixed invalid argument CUDA error

* Fixed empty and missing frames

* Free remaining frames in the queue

* Added nv12 to yuv420 conversion support for host frames

* Added unit test and cleaned up code

* Use CUDA_HOME inside if

* Undo commented out code

* Add Readme

* Remove output_format and use_device_frame optional arguments from the VideoReader API

* Cleaned up init()

* Fix warnings

* Fix python linter errors

* Fix linter issues in setup.py

* clang-format

* Make reformat private

* Member function naming

* Add comments

* Variable renaming

* Code cleanup

* Make return type of decode() void

* Replace printing errors with throwing runtime_error

* Replaced runtime_error with TORCH_CHECK in demuxer.h

* Use CUDAGuard instead of cudaSetDevice

* Remove printf

* Use Tensor instead of uint8* and remove cuMemAlloc/cuMemFree

* Use TORCH_CHECK instead of runtime_error

* Use TORCHVISION_INCLUDE and TORCHVISION_LIBRARY to pass video codec location

* Include ffmpeg_include_dir

* Remove space

* Removed use of runtime_error

* Update Readme

* Check for bsf.h

* Change struct initialisation style

* Clean-up get_operating_point

* Make variable naming convention uniform

* Move checking for bsf.h around

* Fix linter error

Reviewed By: NicolasHug

Differential Revision: D33476941

fbshipit-source-id: e310435c966fe79ab77eaba305a03dd0af7a17a5

Co-authored-by: Francisco Massa <fvsmassa@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[RFC] Hardware-accelerated video decoding
4 participants