Skip to content

When using CUDA the first run is very slow #10746

Open

Description

Description
This is a placeholder issue to describe all that is known about this problem. It is not a bug, but a design limitation of how CUDA works.

The typical symptom is the first run of a model takes a very long time (minutes of time), but the second runs during the same session are very fast. This huge delay is due to the CUDA run time generating binary code for your current GPU architecture. Onnxruntime does include precompiled CUDA for current popular GPU architectures but unfortunately cannot include all of them, plus it will never include ones that were released after an Onnxruntime release.

For complete details of this, see nVidia's blog entry on the topic:

https://developer.nvidia.com/blog/cuda-pro-tip-understand-fat-binaries-jit-caching

Normally, CUDA will cache the results of this compile step into a cache so this will only ever happen once on a system. It won't be able to do this if the cache is disabled or non writable (see the JIT Caching section in the link above).

Another way to avoid the problem is to build a version of onnxruntime with the binary code for the GPU architectures you're using.

First see what architecture your GPU is.

Now use nVidia's cuobjdump util to find out which GPU architecture the onnxruntime library was compiled for. Something like :
cuobjdump -sass libonnxruntime_providers_cuda.so | grep "arch ="

To build Onnxruntime for your GPU architecture you can use the CMAKE_CUDA_ARCHITECTURES environment variable. See https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html for how to use it.

If you'd rather modify the Onnxruntime source files directly, the location is here:

if (NOT CMAKE_CUDA_ARCHITECTURES)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

ep:CUDAissues related to the CUDA execution provider

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions