[Documentation] The documentation for TensorRT provider's optimization functionality is not understandable or complete.

### Describe the documentation issue

The parts about the cache paths, what they do and how they are to be formed is not understandable at all if you have provided the model as a blob to the ort::Session constructor. The description of the Embed Engine mode is especially frustrating:

Under "Use embed engine" it is unclear if the settings above that line need also be done. Apparently you have to set the trt_engine_cache_path also in this case, and it must be relative, and this relativeness is going to be from the trt_ep_context_file_path. I can't see any reason to actually store the engines in a different directory than the _ctx.onnx file so why allow it, and why add another file path setting?

The term "context file path" seems like a misnomer, as does "_ctx.onnx" if you want to call this feature "Embedded engine" please use that term in the related settings. Now that's too late so I guess you're stuck with "context file" so this needs a separate explanation.

The trt_weight_stripped_engine_enable and its related path and blob settings are not documented at all but seem very interesting for saving on cache sizes, and maybe even reuse the same optimization for models where only the coefficients differ. A special section in the documentation should be devoted to this type of reuse if it is at all possible. I think maybe the hash used for the filename if none is given is intended to support this use case.

The trt_engine_cache_prefix description is very lacking, for instance there is no mention of what its default is. Maybe again not understandable as we use a onnx blob.

The file tree and strange arrow to some little png with maybe just a random DNN graph is very strange. If this is the result of loading a file called "model.onnx" into a Session please write this explicitly. You can't have the reader guessing so much all the time, it gets impossible to understand.

This is not only a documentation issue, there is also strange behaviour in the code itself and strange unnecessary requirements on the setup. 

The embedded engine caching in general does barely work if you use a blob for the onnx data as the name of the engine/profile files is a hash (of what is unknown) while the context file is always named _ctx.onnx regardless of the model. We had to solve this by adding an extra directory level named by a hash of the model name, the graphics adapter name, the onnxruntime version and the trt_profile_min_shapes string.

The trt_engine_cache_prefix seems to be prepended to the hash used for the engine/profile but not to the _ctx.onnx file, so it doesn't help when embedded engine caching is combined with onnx as a blob.



### Page / URL

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Documentation] The documentation for TensorRT provider's optimization functionality is not understandable or complete. #22154

BengtGustafsson
openedon Sep 19, 2024

Describe the documentation issue

Page / URL

Assignees

Labels

Type

Projects

Milestone

Relationships

Development