-
Notifications
You must be signed in to change notification settings - Fork 365
feat: second attempt to support DDS and NonZero op #3388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
98aebfd
to
8ef1a87
Compare
d55c451
to
ad04cf9
Compare
9a9852f
to
d718464
Compare
if ( | ||
node != output_node | ||
and len(node.users) == 0 | ||
and len(node.all_input_nodes) > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to add an assert checking if if has only one input (print the number in the string if it fails)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously reused the code from other lowering pass. it looks like we can directly remove unused ops right?
TensorRT/py/torch_tensorrt/dynamo/lowering/passes/remove_num_users_is_0_nodes.py
Lines 20 to 28 in eed420a
if ( | |
node != output_node | |
and len(node.users) == 0 | |
and len(node.all_input_nodes) > 0 | |
): | |
gm.graph.erase_node(node) | |
gm = clean_up_graph_after_modifications(gm) | |
logger.debug(f"Removed ops that [num_users=0] nodes:\n{gm.graph}") |
do you think if there's any potential issues?
shape_changed = self.validate_input_shapes(inputs) | ||
( | ||
need_cudagraphs_record, | ||
can_use_pre_allocated_outputs, | ||
need_cudagraphs_reset, | ||
) = self.runtime_states.set_runtime_states( | ||
cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed | ||
self.cudagraphs_enabled, self.use_pre_allocated_outputs, shape_changed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is use_pre_allocated_outputs
valid now that you're adding OA feature ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the OA feature will not affact use_pre_allocated_outputs
because I didn't change the behavior of CG and use_pre_allocated_outputs
has its own context manager as well.
raise RuntimeError( | ||
"Both CUDA Graphs and OutputAllocator are enabled. Please disable either one." | ||
) | ||
if self.use_output_allocator_outputs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is use_output_allocator_outputs
set ? Is it by using the with context manager by the user ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it will be set by the with context manager by the user. If users don't set it, it will choose standard exec or OA according to the converter decorator.
28b27c5
to
7e1a1ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after minor change
Description
Added a new path to support Data Dependent Shape (DDS) and NonZero op in this PR.
Static and dynamic shapes go the original path; DDS goes the new path with IOutputAllocator.
Fixes #2516
Type of change
Checklist: