Skip to content

✨[Feature] Dynamo No-Op Repair System #2561

Open
@gs-olive

Description

@gs-olive

Context

In TensorRT, there are certain rules and restrictions regarding tensor I/O which are not entirely in line with those in Torch. For instance, outputs from TRT engines cannot be returned more than once, whereas return (x, y, y) is a valid expression in Torch. Similarly, inputs in TensorRT cannot be returned immediately as output.

Current Solution

The current solution is to copy all tensors upon any no-op operations being encountered (aten._to_copy, aten.copy, aten.clone, and others). This solution suffers from a few drawbacks, including a decrease in performance if those unnecessary layers are not optimized away, and a graph which is overly large.

Proposed Solution

For all operators which are no-ops in TensorRT, omit them entirely from the graph. Then, add a lowering pass which analyzes the graph operators, seeking any graph branches which could be a sequence of no-ops from the inputs to the outputs, or a sequence of no-ops linking two outputs. These are cases which TRT will throw an error for. Insert a single _to_copy node with force_layer=True for all such scenarios to avoid TRT errors.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions