feat: second attempt to support DDS and NonZero op #3388

zewenli98 · 2025-02-11T00:40:49Z

Description

Added a new path to support Data Dependent Shape (DDS) and NonZero op in this PR.
Static and dynamic shapes go the original path; DDS goes the new path with IOutputAllocator.

Fixes #2516

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

py/torch_tensorrt/dynamo/_engine_cache.py

py/torch_tensorrt/dynamo/conversion/_ConverterRegistry.py

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py

peri044 · 2025-03-03T21:41:19Z

py/torch_tensorrt/dynamo/lowering/passes/remove_num_users_is_0_nodes.py

+        if (
+            node != output_node
+            and len(node.users) == 0
+            and len(node.all_input_nodes) > 0


probably better to add an assert checking if if has only one input (print the number in the string if it fails)

I previously reused the code from other lowering pass. it looks like we can directly remove unused ops right?

TensorRT/py/torch_tensorrt/dynamo/lowering/passes/remove_num_users_is_0_nodes.py

Lines 20 to 28 in eed420a

if (

node != output_node

and len(node.users) == 0

and len(node.all_input_nodes) > 0

):

gm.graph.erase_node(node)

gm = clean_up_graph_after_modifications(gm)

logger.debug(f"Removed ops that [num_users=0] nodes:\n{gm.graph}")

do you think if there's any potential issues?

peri044 · 2025-03-03T21:47:08Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

                need_cudagraphs_reset,
            ) = self.runtime_states.set_runtime_states(
-                cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed
+                self.cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed


Is use_pre_allocated_outputs valid now that you're adding OA feature ?

I think the OA feature will not affact use_pre_allocated_outputs because I didn't change the behavior of CG and use_pre_allocated_outputs has its own context manager as well.

peri044 · 2025-03-03T21:49:03Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

+                    raise RuntimeError(
+                        "Both CUDA Graphs and OutputAllocator are enabled. Please disable either one."
+                    )
+                if self.use_output_allocator_outputs:


How is use_output_allocator_outputs set ? Is it by using the with context manager by the user ?

yes, it will be set by the with context manager by the user. If users don't set it, it will choose standard exec or OA according to the converter decorator.

py/torch_tensorrt/runtime/_cudagraphs.py

core/runtime/execute_engine.cpp

docs/_sources/py_api/runtime.rst.txt

docsrc/user_guide/runtime.rst

peri044

LGTM

tests/py/dynamo/runtime/test_output_allocator_py.py

narendasan

LGTM after minor change

zewenli98 requested review from keehyuna, narendasan and peri044 February 11, 2025 00:40

zewenli98 self-assigned this Feb 11, 2025

facebook-github-bot added the cla signed label Feb 11, 2025

github-actions bot requested a review from apbose February 11, 2025 00:41

keehyuna reviewed Feb 11, 2025

View reviewed changes

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py Outdated Show resolved Hide resolved

zewenli98 force-pushed the dds_support2 branch from 98aebfd to 8ef1a87 Compare February 13, 2025 21:35

zewenli98 force-pushed the dds_support2 branch from d55c451 to ad04cf9 Compare February 26, 2025 21:26

github-actions bot added the component: lowering Issues re: The lowering / preprocessing passes label Feb 26, 2025

zewenli98 force-pushed the dds_support2 branch 2 times, most recently from 9a9852f to d718464 Compare February 28, 2025 18:20

zewenli98 mentioned this pull request Mar 1, 2025

Support Data Dependent Shape (DDS) and NonZero op #3364

Closed

7 tasks

narendasan reviewed Mar 3, 2025

View reviewed changes

py/torch_tensorrt/dynamo/_engine_cache.py Show resolved Hide resolved

narendasan reviewed Mar 3, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_ConverterRegistry.py Outdated Show resolved Hide resolved

narendasan reviewed Mar 3, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_ConverterRegistry.py Outdated Show resolved Hide resolved

narendasan reviewed Mar 3, 2025

View reviewed changes

py/torch_tensorrt/dynamo/conversion/_TRTInterpreter.py Outdated Show resolved Hide resolved

peri044 reviewed Mar 3, 2025

View reviewed changes

zewenli98 requested review from narendasan and peri044 March 4, 2025 05:34

zewenli98 force-pushed the dds_support2 branch 2 times, most recently from 28b27c5 to 7e1a1ca Compare March 11, 2025 00:04

github-actions bot added the component: core Issues re: The core compiler label Mar 11, 2025

narendasan reviewed Mar 11, 2025

View reviewed changes

core/runtime/execute_engine.cpp Outdated Show resolved Hide resolved

zewenli98 force-pushed the dds_support2 branch from 299a0f9 to b3367d5 Compare March 12, 2025 23:06