-
Notifications
You must be signed in to change notification settings - Fork 46
Adapt NPP calls for CUDA >= 12.9 #757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for sending this PR @Kh4L, this is very helpful.
I applied our linter and also enabled testing for CUDA 12.9 so that we can correctly check the PR.
Do I understand correctly that in 12.9 we have to use the context-based API, while at the same time the context creation helper was removed?! This sounds error prone, is there any way we could avoid manually building and setting the context attributes?
at::cuda::CUDAStream nppStreamWrapper = | ||
c10::cuda::getStreamFromExternal(nppGetStream(), device_.index()); | ||
nppDoneEvent.record(nppStreamWrapper); | ||
nppDoneEvent.block(at::cuda::getCurrentCUDAStream()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Kh4L Can you confirm my understanding that the nppiNV12ToRGB_8u_P2C3R_Ctx
call well properly wait on the stream before returning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
Do I understand correctly that in 12.9 we have to use the context-based API, while at the same time the context creation helper was removed?! This sounds error prone, is there any way we could avoid manually building and setting the context attributes?
Unfortunately, there is currently no alternative way to build this.
Since I’m not part of the npp team, I can’t comment on their design choices
@Kh4L Can you confirm my understanding that the nppiNV12ToRGB_8u_P2C3R_Ctx call well properly wait on the stream before returning?
That’s correct. We bind the NPP context to the active CUDA stream so we can leverage CUDA stream management rather than performing a blocking sync
@NicolasHug LMK if you need anything else! the assertion error doesn't seem related to my change |
Nothing else to do on your side @Kh4L , thank you. I'll merge this soon, I'll just try to extract all the |
int dev = 0; | ||
cudaError_t err = cudaGetDevice(&dev); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Kh4L , I actually have some questions before moving forward:
- Should we just rely on the existing
device_
attribute instead of callingcudaGetDevice(&dev)
, or are they actually equivalent? - Would it make sense to cache the
nppCtx
across calls? In this PR it looks like we're creating the context over and over for every single frame that needs to be decoded. I wonder if it might be beneficial to cache it in the class and re-use it?
Thanks for your help so far, I'm still trying to build familiarity with that part of the code base.
Recent CUDA versions don't support non-context NPP calls, so use the ctx-based API calls.
Also
CUDA 12.9+
deprecatesnppGetStreamContext
, so we need to build the NPP context manually.