Description
Describe the issue
I try to use the new Embedded engine model / EPContext model functionality for TRT provider.
The creation step succeeds and a _ctx.onnx file is created in the designated directory. This file is 762 bytes in length which seems very small.
On the second run, to utilize the purported speed improvement, I load this file into a memory buffer and provides its address and length to the ort::Session constructor instead of the original onnx blob. Then the constructor throws and unfortunately I only get a ORT_FAIL error code and no message.
I have tried setting the various trt_* option flags trt_engine_cache_enable, trt_dump_ep_context_model and the paths to the same values as before or leave them empty. I have not moved the file since cache creation. Same error every time.
I would be surprised if this depends on something with the model contents, all our models fails equally. It is probably something related to using a blob to the ort::Session constructor instead of a filename. In any event it should work or produce a real error message. I tried changing the constructor used when creating the second ort::Session to one that used the filename of the _ctx.onnx file as a full path, but got basically the same problem. The few last lines of that log appears last here.
Here is the verbose log output for the creation run and then when it fails (after some empty lines). Had to remove some hopefully uninteresting lines to make it fit 65535 chars.
I onnxruntime: Session Options { execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath: enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:3 intra_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: { session.use_env_allocators: 1 session.set_denormal_as_zero: 1 } } [inference_session.cc:583 onnxruntime::InferenceSession::TraceSessionOptions]
I onnxruntime: Flush-to-zero and denormal-as-zero are on [inference_session.cc:483 onnxruntime::InferenceSession::ConstructorCommon::<lambda_f30c1020f3059bccda8ebb7f672ffe4c>::operator ()]
I onnxruntime: Creating and using per session threadpools since use_per_session_threads_ is true [inference_session.cc:491 onnxruntime::InferenceSession::ConstructorCommon]
I onnxruntime: Dynamic block base set to 0 [inference_session.cc:509 onnxruntime::InferenceSession::ConstructorCommon]
E onnxruntime: In the case of dumping context model and for security purpose, the trt_engine_cache_path should be set with a relative path, but it is an absolute path: C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e [tensorrt_execution_provider.cc:1606 onnxruntime::TensorrtExecutionProvider::TensorrtExecutionProvider]
V onnxruntime: [TensorRT EP] img:1x1x64x64 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] img [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, 64, 64, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] mix_factor:1x1 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] mix_factor [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] img:1x1x90x90 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] img [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, 90, 90, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] mix_factor:1x1 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] mix_factor [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] img:1x1x80x80 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] img [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, 80, 80, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] mix_factor:1x1 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] mix_factor [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] TensorRT provider options: device_id: 0, trt_max_partition_iterations: 10, trt_min_subgraph_size: 1, trt_max_workspace_size: 40737418240, trt_fp16_enable: 0, trt_int8_enable: 0, trt_int8_calibration_cache_name: , int8_calibration_cache_available: 0, trt_int8_use_native_tensorrt_calibration_table: 0, trt_dla_enable: 0, trt_dla_core: 0, trt_dump_subgraphs: 0, trt_engine_cache_enable: 1, trt_weight_stripped_engine_enable: 0, trt_onnx_model_folder_path: , trt_cache_path: C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e, trt_global_cache_path: , trt_engine_decryption_enable: 0, trt_engine_decryption_lib_path: , trt_force_sequential_engine_build: 0, trt_context_memory_sharing_enable: 0, trt_layer_norm_fp32_fallback: 0, trt_build_heuristics_enable: 0, trt_sparsity_enable: 0, trt_builder_optimization_level: 3, trt_auxiliary_streams: 0, trt_tactic_sources: , trt_profile_min_shapes: img:1x1x64x64,mix_factor:1x1, trt_profile_max_shapes: img:1x1x90x90,mix_factor:1x1, trt_profile_opt_shapes: img:1x1x80x80,mix_factor:1x1, trt_cuda_graph_enable: 0, trt_dump_ep_context_model: 1, trt_ep_context_file_path: C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e, trt_ep_context_embed_mode: 0, trt_cache_prefix: , trt_engine_hw_compatible: 0, trt_onnx_model_bytestream_size_: 0 [tensorrt_execution_provider.cc:1728 onnxruntime::TensorrtExecutionProvider::TensorrtExecutionProvider]
I onnxruntime: Initializing session. [inference_session.cc:1661 onnxruntime::InferenceSession::Initialize]
I onnxruntime: Adding default CPU execution provider. [inference_session.cc:1698 onnxruntime::InferenceSession::Initialize]
I onnxruntime: Creating BFCArena for Cuda with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0 [bfc_arena.cc:29 onnxruntime::BFCArena::BFCArena]
V onnxruntime: Creating 21 bins of max chunk size 256 to 268435456 [bfc_arena.cc:66 onnxruntime::BFCArena::BFCArena]
I onnxruntime: Creating BFCArena for CudaPinned with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0 [bfc_arena.cc:29 onnxruntime::BFCArena::BFCArena]
V onnxruntime: Creating 21 bins of max chunk size 256 to 268435456 [bfc_arena.cc:66 onnxruntime::BFCArena::BFCArena]
I onnxruntime: Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0 [bfc_arena.cc:29 onnxruntime::BFCArena::BFCArena]
V onnxruntime: Creating 21 bins of max chunk size 256 to 268435456 [bfc_arena.cc:66 onnxruntime::BFCArena::BFCArena]
I onnxruntime: This session will use the allocator registered with the environment. [inference_session.cc:1742 onnxruntime::InferenceSession::Initialize]
I onnxruntime: This model does not have any local functions defined. AOT Inlining is not performed [graph_partitioner.cc:898 onnxruntime::GraphPartitioner::InlineFunctionsAOT]
I onnxruntime: GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer Level1_RuleBasedTransformer modified: 1 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer DoubleQDQPairsRemover modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: Total shared scalar initializer count: 42 [constant_sharing.cc:248 onnxruntime::ConstantSharing::ApplyImpl]
I onnxruntime: GraphTransformer ConstantSharing modified: 1 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer CommonSubexpressionElimination modified: 1 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ConstantFolding modified: 1 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
... more lines here ...
V onnxruntime: [TensorRT EP] input tensor is 'mix_factor' [tensorrt_execution_provider_utils.h:240 onnxruntime::SerializeProfileV2]
V onnxruntime: [TensorRT EP] profile #0, dim is 0 [tensorrt_execution_provider_utils.h:245 onnxruntime::SerializeProfileV2::<lambda_0b667829d0083ac2a8e251e40879af3b>::operator ()]
V onnxruntime: [TensorRT EP] 0, 1, 1, 1 [tensorrt_execution_provider_utils.h:250 onnxruntime::SerializeProfileV2::<lambda_0b667829d0083ac2a8e251e40879af3b>::operator ()]
V onnxruntime: [TensorRT EP] Serialized C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e\TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_5359030231610903815_0_0_sm86.profile [tensorrt_execution_provider.cc:3187 onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph]
V onnxruntime: [TensorRT EP] Serialized engine C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e\TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_5359030231610903815_0_0_sm86.engine [tensorrt_execution_provider.cc:3204 onnxruntime::TensorrtExecutionProvider::CreateNodeComputeInfoFromGraph]
V onnxruntime: [TensorRT EP] Dumped C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e_ctx.onnx [onnx_ctx_model_helper.cc:213 onnxruntime::DumpCtxModel]
I onnxruntime: Removing initializer 'onnx::Concat_370'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.15.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.0.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.15.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.0.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.10.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.5.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Concat_369'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.43.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.17.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.2.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.10.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.5.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Concat_371'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.25.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.17.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.2.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.12.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.7.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.37.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.12.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.7.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.19.adain_mean.0.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.27.adain_std.0.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.19.adain_mean.0.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.19.adain_std.0.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.19.adain_std.0.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.21.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.21.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.23.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.23.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.25.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.39.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.27.adain_mean.0.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.39.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.27.adain_mean.0.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.27.adain_std.0.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.29.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.29.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.31.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.31.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Expand_374'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.35.adain_mean.0.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.33.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Expand_368'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.33.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.35.adain_mean.0.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.35.adain_std.0.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.35.adain_std.0.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.37.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.41.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Resize_378'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.41.conv.bias'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'blocks.43.conv.weight'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Gather_284'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Expand_380'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Pad_82'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Gather_178'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Sub_186'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Add_190'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Resize_207'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Resize_275'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Resize_343'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Cast_138'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Mul_161'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Equal_163'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Mul_229'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Equal_231'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Mul_297'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: Removing initializer 'onnx::Equal_299'. It is no longer used by any node. [graph.cc:4201 onnxruntime::Graph::CleanUnusedInitializersAndNodeArgs]
I onnxruntime: GraphTransformer Level2_RuleBasedTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer TransposeOptimizer_CPUExecutionProvider modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QDQS8ToU8Transformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QDQSelectorActionTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GemmActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulIntegerToFloatFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer DynamicQuantizeMatMulFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ConvActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer LayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer SimplifiedLayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer AttentionFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer EmbedLayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GatherSliceToSplitFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer GatherToSliceFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatmulTransposeFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer BiasGeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer SkipLayerNormFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer FastGeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QuickGeluFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer BiasSoftmaxFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer BiasDropoutFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulScaleFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulNBitsFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QDQFinalCleanupTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer NchwcTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer NhwcTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ConvAddActivationFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer RemoveDuplicateCastTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer CastFloat16Transformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MemcpyTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
V onnxruntime: Node placements [session_state.cc:1146 onnxruntime::VerifyEachNodeIsAssignedToAnEp]
V onnxruntime: All nodes placed on [TensorrtExecutionProvider]. Number of nodes: 1 [session_state.cc:1149 onnxruntime::VerifyEachNodeIsAssignedToAnEp]
V onnxruntime: SaveMLValueNameIndexMapping [session_state.cc:126 onnxruntime::SessionState::CreateGraphInfo]
V onnxruntime: Done saving OrtValue mappings. [session_state.cc:172 onnxruntime::SessionState::CreateGraphInfo]
I onnxruntime: Use DeviceBasedPartition as default [allocation_planner.cc:2567 onnxruntime::IGraphPartitioner::CreateGraphPartitioner]
I onnxruntime: Saving initialized tensors. [session_state_utils.cc:209 onnxruntime::session_state_utils::SaveInitializedTensors]
I onnxruntime: Done saving initialized tensors [session_state_utils.cc:360 onnxruntime::session_state_utils::SaveInitializedTensors]
I onnxruntime: Session successfully initialized. [inference_session.cc:2094 onnxruntime::InferenceSession::Initialize]
I onnxruntime: Extending BFCArena for Cuda. bin_num:6 (requested) num_bytes: 25600 (actual) rounded_bytes:25600 [bfc_arena.cc:347 onnxruntime::BFCArena::AllocateRawInternal]
I onnxruntime: Extended allocation by 1048576 bytes. [bfc_arena.cc:206 onnxruntime::BFCArena::Extend]
I onnxruntime: Total allocated bytes: 1048576 [bfc_arena.cc:209 onnxruntime::BFCArena::Extend]
I onnxruntime: Allocated memory at 0000000B24A53400 to 0000000B24B53400 [bfc_arena.cc:212 onnxruntime::BFCArena::Extend]
Next call, when it fails:
I onnxruntime: Session Options { execution_mode:0 execution_order:DEFAULT enable_profiling:0 optimized_model_filepath: enable_mem_pattern:1 enable_mem_reuse:1 enable_cpu_mem_arena:1 profile_file_prefix:onnxruntime_profile_ session_logid: session_log_severity_level:-1 session_log_verbosity_level:0 max_num_graph_transformation_steps:10 graph_optimization_level:3 intra_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } inter_op_param:OrtThreadPoolParams { thread_pool_size: 0 auto_set_affinity: 0 allow_spinning: 1 dynamic_block_base_: 0 stack_size: 0 affinity_str: set_denormal_as_zero: 0 } use_per_session_threads:1 thread_pool_allow_spinning:1 use_deterministic_compute:0 config_options: { session.use_env_allocators: 1 session.set_denormal_as_zero: 1 } } [inference_session.cc:583 onnxruntime::InferenceSession::TraceSessionOptions]
I onnxruntime: Flush-to-zero and denormal-as-zero are on [inference_session.cc:483 onnxruntime::InferenceSession::ConstructorCommon::<lambda_f30c1020f3059bccda8ebb7f672ffe4c>::operator ()]
I onnxruntime: Creating and using per session threadpools since use_per_session_threads_ is true [inference_session.cc:491 onnxruntime::InferenceSession::ConstructorCommon]
I onnxruntime: Dynamic block base set to 0 [inference_session.cc:509 onnxruntime::InferenceSession::ConstructorCommon]
V onnxruntime: [TensorRT EP] img:1x1x64x64 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] img [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, 64, 64, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] mix_factor:1x1 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] mix_factor [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] img:1x1x90x90 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] img [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, 90, 90, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] mix_factor:1x1 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] mix_factor [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] img:1x1x80x80 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] img [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, 80, 80, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] mix_factor:1x1 [tensorrt_execution_provider_utils.h:650 onnxruntime::MakeInputNameShapePair]
V onnxruntime: [TensorRT EP] mix_factor [tensorrt_execution_provider_utils.h:709 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] 1, 1, [tensorrt_execution_provider_utils.h:715 onnxruntime::ParseProfileShapes]
V onnxruntime: [TensorRT EP] TensorRT provider options: device_id: 0, trt_max_partition_iterations: 10, trt_min_subgraph_size: 1, trt_max_workspace_size: 40737418240, trt_fp16_enable: 0, trt_int8_enable: 0, trt_int8_calibration_cache_name: , int8_calibration_cache_available: 0, trt_int8_use_native_tensorrt_calibration_table: 0, trt_dla_enable: 0, trt_dla_core: 0, trt_dump_subgraphs: 0, trt_engine_cache_enable: 1, trt_weight_stripped_engine_enable: 0, trt_onnx_model_folder_path: , trt_cache_path: C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e, trt_global_cache_path: , trt_engine_decryption_enable: 0, trt_engine_decryption_lib_path: , trt_force_sequential_engine_build: 0, trt_context_memory_sharing_enable: 0, trt_layer_norm_fp32_fallback: 0, trt_build_heuristics_enable: 0, trt_sparsity_enable: 0, trt_builder_optimization_level: 3, trt_auxiliary_streams: 0, trt_tactic_sources: , trt_profile_min_shapes: img:1x1x64x64,mix_factor:1x1, trt_profile_max_shapes: img:1x1x90x90,mix_factor:1x1, trt_profile_opt_shapes: img:1x1x80x80,mix_factor:1x1, trt_cuda_graph_enable: 0, trt_dump_ep_context_model: 0, trt_ep_context_file_path: C:\ProgramData\ContextVision\cvn_cache\97a89ed4d088f776182993629d3f5cd22d3fc07e, trt_ep_context_embed_mode: 0, trt_cache_prefix: , trt_engine_hw_compatible: 0, trt_onnx_model_bytestream_size_: 0 [tensorrt_execution_provider.cc:1728 onnxruntime::TensorrtExecutionProvider::TensorrtExecutionProvider]
I onnxruntime: Initializing session. [inference_session.cc:1661 onnxruntime::InferenceSession::Initialize]
I onnxruntime: Adding default CPU execution provider. [inference_session.cc:1698 onnxruntime::InferenceSession::Initialize]
I onnxruntime: Creating BFCArena for Cuda with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0 [bfc_arena.cc:29 onnxruntime::BFCArena::BFCArena]
V onnxruntime: Creating 21 bins of max chunk size 256 to 268435456 [bfc_arena.cc:66 onnxruntime::BFCArena::BFCArena]
I onnxruntime: Creating BFCArena for CudaPinned with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0 [bfc_arena.cc:29 onnxruntime::BFCArena::BFCArena]
V onnxruntime: Creating 21 bins of max chunk size 256 to 268435456 [bfc_arena.cc:66 onnxruntime::BFCArena::BFCArena]
I onnxruntime: Creating BFCArena for Cpu with following configs: initial_chunk_size_bytes: 1048576 max_dead_bytes_per_chunk: 134217728 initial_growth_chunk_size_bytes: 2097152 max_power_of_two_extend_bytes: 1073741824 memory limit: 18446744073709551615 arena_extend_strategy: 0 [bfc_arena.cc:29 onnxruntime::BFCArena::BFCArena]
V onnxruntime: Creating 21 bins of max chunk size 256 to 268435456 [bfc_arena.cc:66 onnxruntime::BFCArena::BFCArena]
I onnxruntime: This session will use the allocator registered with the environment. [inference_session.cc:1742 onnxruntime::InferenceSession::Initialize]
I onnxruntime: This model does not have any local functions defined. AOT Inlining is not performed [graph_partitioner.cc:898 onnxruntime::GraphPartitioner::InlineFunctionsAOT]
I onnxruntime: GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer Level1_RuleBasedTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer DoubleQDQPairsRemover modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ConstantSharing modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer CommonSubexpressionElimination modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ConstantFolding modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer MatMulAddFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer ReshapeFusion modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer FreeDimensionOverrideTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer QDQPropagationTransformer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer EnsureUniqueDQForNodeUnit modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer RocmBlasAltImpl modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: GraphTransformer TransposeOptimizer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: [TensorRT EP] Model path is empty [tensorrt_execution_provider_utils.h:553 onnxruntime::TRTGenerateId]
I onnxruntime: [TensorRT EP] TensorRT subgraph MetaDef name TRTKernel_graph_TRTKernel_graph_torch-jit-export_5359030231610903815_0_16247538195125596789_0 [tensorrt_execution_provider.cc:2058 onnxruntime::TensorrtExecutionProvider::GetSubGraph]
Last part of output when using the ort::Session ctor with filename, specifying the _ctx.onnx file's full path:
I onnxruntime: GraphTransformer TransposeOptimizer modified: 0 with status: OK [graph_transformer.cc:15 onnxruntime::GraphTransformer::Apply]
I onnxruntime: [TensorRT EP] Model name is _ctx.onnx [tensorrt_execution_provider_utils.h:543 onnxruntime::TRTGenerateId]
I onnxruntime: [TensorRT EP] TensorRT subgraph MetaDef name TRTKernel_graph_TRTKernel_graph_torch-jit-export_5359030231610903815_0_998102994216759885_0 [tensorrt_execution_provider.cc:2058 onnxruntime::TensorrtExecutionProvider::GetSubGraph]
To reproduce
Create the cache by constructing an ort::Session with:
trt_engine_cache_enable = true
trt_dump_ep_context_model = true
trt_ep_context_file_path = my_full_path_to_empty_existing_directory
trt_engine_cache_path = my_full_path_to_empty_existing_directory
and the original onnx data as a blob presented with a buffer/count.
This should generate three files in the directory, one of which is _ctx.onnx and the others something like:
TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_5359030231610903815_0_0_sm86.engine
TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_5359030231610903815_0_0_sm86.profile
I have no idea what that hash is.
Next time around just replace the contents of the blob with the contents of the _ctx.onnx file in the same directory.
Urgency
I hope we can find a work-around with the current version, it can't be entirely non-functional I hope...
Platform
Windows
OS Version
Windows 11
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.9.2
ONNX Runtime API
C++
Architecture
X64
Execution Provider
TensorRT
Execution Provider Library Version
TensorRT 10.4.0.26 on CUDA 11.6