Tentatively eliminate graph break overhead #3741

cehongwang · 2025-08-01T22:05:14Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant and/or add your own.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

narendasan

Can you include similar changes to the C++ runtime as well?

narendasan · 2025-08-19T21:03:15Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

        self._caller_stream: Optional[torch.cuda.Stream] = None
        self._engine_stream: Optional[torch.cuda.Stream] = None
+        self.output_tensors: Optional[List[torch.Tensor]] = None
+        self.sync_stream = True


Just inherit stream from PyTorch / input tensors

narendasan · 2025-08-19T21:05:07Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

            # For shape tensors, we use CPU pointers and for data tensors, we use GPU pointers
            # as per TensorRT requirements
-            if self.engine.is_shape_inference_io(input_name):
+            if self.is_shape_inference_io[i]:


Probably better to make this a dictionary and key on names, instead of implicitly relying on input order to stay the same over time

narendasan · 2025-08-19T21:06:22Z

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py

-                    input_name, tuple(contiguous_inputs[i].shape)
-                )
+                if shape_changed:
+                    self.context.set_input_shape(


Can we safely assume execution context holds shape between inference calls?

narendasan · 2025-08-19T21:09:47Z

py/torch_tensorrt/dynamo/_compiler.py


+    # Only set the requires_unique_output flag for the last TRT Module when user has access to the output tensor
+    if trt_module and settings.use_python_runtime:
+        trt_module.set_requires_unique_output(True)


How is this going to work with serialization in C++?

Also make the name clearer like trt_module.module_is_output_operator or trt_module.requires_unowned_output_tensor

cehongwang · 2025-08-19T23:15:27Z

Can you include similar changes to the C++ runtime as well?

Yeah once we think all changes in pytorch is valid and I can make changes accordingly

…r main

meta-cla bot added the cla signed label Aug 1, 2025

github-actions bot added component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Aug 1, 2025

github-actions bot requested a review from peri044 August 1, 2025 22:05

narendasan reviewed Aug 1, 2025

View reviewed changes

py/torch_tensorrt/dynamo/runtime/_PythonTorchTensorRTModule.py Outdated Show resolved Hide resolved

cehongwang force-pushed the graph-break branch from c6830ff to 5fb0beb Compare August 6, 2025 22:34

peri044 reviewed Aug 15, 2025

View reviewed changes

narendasan reviewed Aug 19, 2025

View reviewed changes

github-actions bot added the component: core Issues re: The core compiler label Aug 20, 2025

cehongwang force-pushed the graph-break branch from b2ef228 to a9a27b1 Compare August 27, 2025 22:14

github-actions bot added the component: build system Issues re: Build system label Aug 27, 2025

cehongwang added 13 commits September 17, 2025 22:06

Enabled Qwen MoE with 1 layer. Rewrote index_put converter

754743b

fixed the perf issue in the lowering pass

2140c49

Optimized index converter

a016bc0

Fixed a typo in the converter. Covered the discontinuous tests

6ea89ae

Supported bool mask indicies

c286767

Delete one copy

2540824

Added an example that can compile on A40 with this PR but cannot unde…

c7f8b12

…r main

Commented out for NVBug people to debug

711446c

Reduced memory usage of use_python_runtime=True with the new API

35d5861

ready for review

503f320

Revised according to comments

6b1950c

Cleared 2x+ dangling memory after compilation

1e2e669

Added testcases and try catch

33ca588

cehongwang added 3 commits October 2, 2025 22:37

Revert back to support lazy init while reducing the memory consumption

d99f183

Added a potential solution for windows

66b40bd

Revert windows solution. Not working

880b639

cehongwang force-pushed the graph-break branch from d862b68 to 52f7c48 Compare October 10, 2025 23:33

cehongwang added 2 commits October 13, 2025 04:52

squat chages

0ca78fd

Changed the stream of python runtime to default stream

5e0d3e8

cehongwang force-pushed the graph-break branch from 1c0a8aa to 5e0d3e8 Compare October 13, 2025 04:53

github-actions bot added component: tests Issues re: Tests component: lowering Issues re: The lowering / preprocessing passes component: conversion Issues re: Conversion stage component: converters Issues re: Specific op converters labels Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tentatively eliminate graph break overhead #3741

Tentatively eliminate graph break overhead #3741

Uh oh!

cehongwang commented Aug 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan left a comment

Uh oh!

narendasan Aug 19, 2025

Uh oh!

narendasan Aug 19, 2025

Uh oh!

narendasan Aug 19, 2025

Uh oh!

narendasan Aug 19, 2025

Uh oh!

cehongwang commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Tentatively eliminate graph break overhead #3741

Are you sure you want to change the base?

Tentatively eliminate graph break overhead #3741

Uh oh!

Conversation

cehongwang commented Aug 1, 2025

Description

Type of change

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

narendasan left a comment

Choose a reason for hiding this comment

Uh oh!

narendasan Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

narendasan Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

narendasan Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

narendasan Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

cehongwang commented Aug 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants