Description
Context
In TensorRT, there are certain rules and restrictions regarding tensor I/O which are not entirely in line with those in Torch. For instance, outputs from TRT engines cannot be returned more than once, whereas return (x, y, y)
is a valid expression in Torch. Similarly, inputs in TensorRT cannot be returned immediately as output.
Current Solution
The current solution is to copy all tensors upon any no-op operations being encountered (aten._to_copy
, aten.copy
, aten.clone
, and others). This solution suffers from a few drawbacks, including a decrease in performance if those unnecessary layers are not optimized away, and a graph which is overly large.
Proposed Solution
For all operators which are no-ops in TensorRT, omit them entirely from the graph. Then, add a lowering pass which analyzes the graph operators, seeking any graph branches which could be a sequence of no-ops from the inputs to the outputs, or a sequence of no-ops linking two outputs. These are cases which TRT will throw an error for. Insert a single _to_copy
node with force_layer=True
for all such scenarios to avoid TRT errors.