Description
Describe the issue
We want to use trt_dump_ep_context_model to minimize the setup time and we want to use trt_weight_stripped_engine_enable to protect our models from competitors when we deliver our software.
While both these features work separately (most of the time in case of trt_weight_stripped_engine_enable) we can't get them to work when both are enabled, we get errors from the ort::Session constructor:
Non-zero status code returned while running TRTKernel_graph_TRTKernel_graph_torch-jit-export_5359030231610903815_0_998102994216759885_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_TRTKernel_graph_torch-jit-export_5359030231610903815_0_998102994216759885_0_0' Status Message: C:\gitlab-runner\builds\H1MW1hSx\0\cv\impl\thirdpartylibs\onnxruntime\onnxruntime\core\providers\tensorrt\tensorrt_execution_provider.cc:949 onnxruntime::BindContextInput [ONNXRuntimeError] : 11 : EP_FAIL : TensorRT EP failed to call nvinfer1::IExecutionContext::setInputShape() for input 'img'
Here is the last part of the log output on INFO level:
I onnxruntime: GraphTransformer TransposeOptimizer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: [TensorRT EP] Model name is _ctx.onnx [tensorrt_execution_provider_utils.h:543 onnxruntime::TRTGenerateId]
I onnxruntime: [TensorRT EP] TensorRT subgraph MetaDef name TRTKernel_graph_TRTKernel_graph_torch-jit-export_5359030231610903815_0_998102994216759885_0 [tensorrt_execution_provider.cc:2058 onnxruntime::TensorrtExecutionProvider::GetSubGraph]
V onnxruntime: [TensorRT EP] GetEpContextFromGraph engine_cache_path: C:\ProgramData\ContextVision\cvn_cache\e9d2977634193884f085e9031fa54f0c24fc45f2./TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_5359030231610903815_0_0_sm86.stripped.engine [onnx_ctx_model_helper.cc:324 onnxruntime::TensorRTCacheModelHandler::GetEpContextFromGraph]
V onnxruntime: [TensorRT EP] TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_5359030231610903815_0_0_sm86.engine exists. [onnx_ctx_model_helper.cc:335 onnxruntime::TensorRTCacheModelHandler::GetEpContextFromGraph]
V onnxruntime: [TensorRT EP] DeSerialized TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_5359030231610903815_0_0_sm86.engine [onnx_ctx_model_helper.cc:358 onnxruntime::TensorRTCacheModelHandler::GetEpContextFromGraph]
I onnxruntime: GraphTransformer Level2_RuleBasedTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer TransposeOptimizer_CPUExecutionProvider modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QDQS8ToU8Transformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QDQSelectorActionTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GemmActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulIntegerToFloatFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer DynamicQuantizeMatMulFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ConvActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer LayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer SimplifiedLayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer AttentionFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer EmbedLayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GatherSliceToSplitFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GatherToSliceFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatmulTransposeFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer BiasGeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer SkipLayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer FastGeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QuickGeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer BiasSoftmaxFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer BiasDropoutFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulScaleFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulNBitsFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer NchwcTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer NhwcTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ConvAddActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer RemoveDuplicateCastTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer CastFloat16Transformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MemcpyTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
V onnxruntime: Node placements [session_state.cc:1146 onnxruntime::VerifyEachNodeIsAssignedToAnEp]
V onnxruntime: All nodes placed on [TensorrtExecutionProvider]. Number of nodes: 1 [session_state.cc:1149 onnxruntime::VerifyEachNodeIsAssignedToAnEp]
V onnxruntime: SaveMLValueNameIndexMapping [session_state.cc:126 onnxruntime::SessionState::CreateGraphInfo]
V onnxruntime: Done saving OrtValue mappings. [session_state.cc:172 onnxruntime::SessionState::CreateGraphInfo]
I onnxruntime: Use DeviceBasedPartition as default [allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner]
I onnxruntime: Saving initialized tensors. [session_state_utils.cc:209 onnxruntime::session_state_utils::SaveInitializedTensors]
I onnxruntime: Done saving initialized tensors [session_state_utils.cc:360 onnxruntime::session_state_utils::SaveInitializedTensors]
I onnxruntime: Session successfully initialized. [inference_session.cc:2094 onnxruntime::InferenceSession::Initialize]
Must have redone optimization as it took: 1.0658e+06 ms to create ort::Session for:
Device: NVIDIA RTX A5000, Identifier: [Denoising, 1, Denoising, Velvet], TensorRT version: 10.4.0.26, OnnxRuntime version: 1.19.2, SmallSize: img:1x1x64x64,mix_factor:1x1
Hash: e9d2977634193884f085e9031fa54f0c24fc45f2
OptSize: img:1x1x80x80,mix_factor:1x1, MaxSize: img:1x1x90x90,mix_factor:1x1, Date/Time: 2024-09-23 07:37:02.9472181
I onnxruntime: Extending BFCArena for Cuda. bin_num:6 (requested) num_bytes: 25600 (actual) rounded_bytes:25600 [bfc_arena.cc:347 onnxruntime::BFCArena::AllocateRawInternal]
I onnxruntime: Extended allocation by 1048576 bytes. [bfc_arena.cc:206 onnxruntime::BFCArena::Extend]
I onnxruntime: Total allocated bytes: 1048576 [bfc_arena.cc:209 onnxruntime::BFCArena::Extend]
I onnxruntime: Allocated memory at 0000000C24B00000 to 0000000C24C00000 [bfc_arena.cc:212 onnxruntime::BFCArena::Extend]
E onnxruntime: [2024-09-23 07:54:55 ERROR] IExecutionContext::setInputShape: Error Code 3: API Usage Error (Parameter check failed, condition: satisfyProfile. Set dimension [1,1,80,80] for tensor img does not satisfy any optimization profiles. Valid range for profile 0: [1,1,45,64]..[1,1,64,90].) [tensorrt_execution_provider.h:88 onnxruntime::TensorrtLogger::log]
The options dump as:
V onnxruntime: [TensorRT EP] TensorRT provider options: device_id: 0, trt_max_partition_iterations: 1000, trt_min_subgraph_size: 1, trt_max_workspace_size: 40737418240, trt_fp16_enable: 0, trt_int8_enable: 0, trt_int8_calibration_cache_name: , int8_calibration_cache_available: 0, trt_int8_use_native_tensorrt_calibration_table: 0, trt_dla_enable: 0, trt_dla_core: 0, trt_dump_subgraphs: 0, trt_engine_cache_enable: 1, trt_weight_stripped_engine_enable: 0, trt_onnx_model_folder_path: , trt_cache_path: ./, trt_global_cache_path: , trt_engine_decryption_enable: 0, trt_engine_decryption_lib_path: , trt_force_sequential_engine_build: 0, trt_context_memory_sharing_enable: 0, trt_layer_norm_fp32_fallback: 0, trt_build_heuristics_enable: 0, trt_sparsity_enable: 0, trt_builder_optimization_level: 3, trt_auxiliary_streams: 0, trt_tactic_sources: , trt_profile_min_shapes: img:1x1x64x64,mix_factor:1x1, trt_profile_max_shapes: img:1x1x90x90,mix_factor:1x1, trt_profile_opt_shapes: img:1x1x80x80,mix_factor:1x1, trt_cuda_graph_enable: 0, trt_dump_ep_context_model: 0, trt_ep_context_file_path: C:\ProgramData\ContextVision\cvn_cache\e9d2977634193884f085e9031fa54f0c24fc45f2, trt_ep_context_embed_mode: 0, trt_cache_prefix: , trt_engine_hw_compatible: 0, trt_onnx_model_bytestream_size_: 1750065 [tensorrt_execution_provider.cc:1728 onnxruntime::TensorrtExecutionProvider::TensorrtExecutionProvider]
I
As trt_dump_ep_context_model always stores the resulting .engine files on disk directly the only other way to protect our models would be to try to encrypt the resulting file as quickly as possible and then decrypt it again before loading it. This obviously is less than safe and it would be rather easy to come by the data by monitoring the file writes.
To reproduce
Enable both features, see option printout above.
Create an ort::Session. Note the error.
Urgency
No response
Platform
Windows
OS Version
11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.19.2
ONNX Runtime API
C++
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
TensorRT 10.4.0.26 on CUDA 11.6