Description
Describe the issue
Our test program runs our different AI networks (7 at the moment) and does this for different size ranges. We include the size range in the hash that is part of the cache directory name. We have to have separate directories for each optimization as we use a onnx blob so all your files get the same name (at least with embedded engine enabled). The test program creates 27 different cache directories.
On the second run of the test program we expected all caches to be reused. This happens if we don't use trt_weight_stripped_engine_enable but if we enable it 5 of 27 cache directories are not used. On the second run optimization is redone, and the files in the cache directory are updated but there is still the same re-optimization on succeeding runs.
As we use an onnx blob we set up the same data in the trt_onnx_bytestream and trt_onnx_bytestream_size members on succeding runs, hoping that it will be used from there.
To reproduce
set trt_weight_stripped_engine_enable and create sessions for multiple models into different cache directories. Rerun and check that setup time is not > 1 s. It could be related to having the onnx data as a blob but it seems rather unlikely as most cachings work.
Urgency
No response
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.19.2
ONNX Runtime API
C++
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
TensorRT 10.4.0.26 on CUDA 11.6