You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dynamic output allocation is a feature in Torch-TensorRT which allows the output buffer of TensorRT engines to be
100
-
dynamically allocated. This is useful for models with dynamic output shapes, especially ops with data-dependent shapes.
101
-
Without dynamic output allocation, the output buffer is statically allocated and the size is the maximum possible size
102
-
required by the op. This can lead to inefficient memory usage if the actual output size is smaller than the maximum possible size.
100
+
dynamically allocated. This is useful for models with dynamic output shapes, especially ops with data-dependent shapes.
101
+
Dynamic output allocation mode cannot be used in conjunction with CUDA Graphs nor pre-allocated outputs feature.
102
+
Without dynamic output allocation, the output buffer is allocated based on the inferred output shape based on input size.
103
103
104
104
There are two scenarios in which dynamic output allocation is enabled:
105
105
106
-
1. When the model contains submodules that require a dynamic output allocator at runtime, users don't have to manually enable dynamic output allocation mode.
106
+
1. The model has been identified at compile time to require dynamic output allocation for at least one TensorRT subgraph.
107
+
These models will engage the runtime mode automatically (with logging) and are incompatible with other runtime modes
108
+
such as CUDA Graphs.
107
109
108
-
To specify if a module requires a dynamic output allocator, users can set the ``requires_output_allocator=True`` flag in the ``@dynamo_tensorrt_converter`` decorator of converters. e.g.,
110
+
Converters can declare that subgraphs that they produce will require the output allocator using `requires_output_allocator=True`
111
+
there by forcing any model which utilizes the converter to automatically use the output allocator runtime mode. e.g.,
109
112
110
113
.. code-block:: python
111
114
@@ -123,7 +126,7 @@ To specify if a module requires a dynamic output allocator, users can set the ``
123
126
) -> Union[TRTTensor, Sequence[TRTTensor]]:
124
127
...
125
128
126
-
2. When users manually enable dynamic output allocation via the ``torch_tensorrt.runtime.enable_output_allocator`` context manager.
129
+
2. Users may manually enable dynamic output allocation mode via the ``torch_tensorrt.runtime.enable_output_allocator`` context manager.
0 commit comments