Skip to content

Adapt NPP calls for CUDA >= 12.9 #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Conversation

Kh4L
Copy link

@Kh4L Kh4L commented Jul 7, 2025

Recent CUDA versions don't support non-context NPP calls, so use the ctx-based API calls.
Also CUDA 12.9+ deprecates nppGetStreamContext, so we need to build the NPP context manually.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jul 7, 2025
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for sending this PR @Kh4L, this is very helpful.

I applied our linter and also enabled testing for CUDA 12.9 so that we can correctly check the PR.

Do I understand correctly that in 12.9 we have to use the context-based API, while at the same time the context creation helper was removed?! This sounds error prone, is there any way we could avoid manually building and setting the context attributes?

at::cuda::CUDAStream nppStreamWrapper =
c10::cuda::getStreamFromExternal(nppGetStream(), device_.index());
nppDoneEvent.record(nppStreamWrapper);
nppDoneEvent.block(at::cuda::getCurrentCUDAStream());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kh4L Can you confirm my understanding that the nppiNV12ToRGB_8u_P2C3R_Ctx call well properly wait on the stream before returning?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Do I understand correctly that in 12.9 we have to use the context-based API, while at the same time the context creation helper was removed?! This sounds error prone, is there any way we could avoid manually building and setting the context attributes?

Unfortunately, there is currently no alternative way to build this.
Since I’m not part of the npp team, I can’t comment on their design choices

@Kh4L Can you confirm my understanding that the nppiNV12ToRGB_8u_P2C3R_Ctx call well properly wait on the stream before returning?

That’s correct. We bind the NPP context to the active CUDA stream so we can leverage CUDA stream management rather than performing a blocking sync

https://github.com/pytorch/torchcodec/pull/757/files#diff-37d8a09669d3f009b6850f6e66888b6875d805064933148fce3a637cc7694712R254

@Kh4L
Copy link
Author

Kh4L commented Jul 10, 2025

@NicolasHug LMK if you need anything else!

the assertion error doesn't seem related to my change

@NicolasHug
Copy link
Member

Nothing else to do on your side @Kh4L , thank you. I'll merge this soon, I'll just try to extract all the #ifdef stuff into their own single function before merging.

Comment on lines +239 to +240
int dev = 0;
cudaError_t err = cudaGetDevice(&dev);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Kh4L , I actually have some questions before moving forward:

  1. Should we just rely on the existing device_ attribute instead of calling cudaGetDevice(&dev), or are they actually equivalent?
  2. Would it make sense to cache the nppCtx across calls? In this PR it looks like we're creating the context over and over for every single frame that needs to be decoded. I wonder if it might be beneficial to cache it in the class and re-use it?

Thanks for your help so far, I'm still trying to build familiarity with that part of the code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants