Skip to content

Conversation

cheneeheng
Copy link

I tested this setup with CUDA11.7 + cuDNN8.5 on a GTX1660TI. It runs openpose for human pose extraction normally without the huge GPU memory usage issue. The GPU memory usage is the same as the CUDA10.2+cuDNN7 setup, while the inference speed is about ~1fps faster.

Hope this helps someone who needs to use CUDA11 very badly.

Changelog:

  • added cudnn-frontend submodule.
  • updated cmake with new flag and new 3rdparty repository cudnn_frontend .
  • changed caffe submodule repo target.
    -- added DUSE_CUDNN_FRONTEND option. Uses the frontend api instead of the current algorithm wrapper cudnnGetConvolutionForwardAlgorithm_v7 for cuDNN8.
    -- added cudnn_v8_utils.hpp + cudnn_v8_utils.cpp files for cudnn_frontend api. It currently only supports forwardpass.
    -- fixed warnings.
    -- reduced GPU memory usage by setting CUDNN_STREAMS_PER_GROUP=1
    -- added compute capability check in tensor creation to enable tensor core usage in ampere cards.

- added cudnn-frontend submodule
- updated cmake
- changed caffe submodule repo target
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant