-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
❓ [Question] How to decide if an Op should support dynamic shape or not #3224
Comments
If you are finding Core ATen ops which we convert that don't support dynamic shape, please file issues, my impression is that we should cover nearly all of them at this point. cc @apbose @chohk88 |
@narendasan thank you for your explanation, and I think your suggestion totally makes sense to me. BTW, originally I make this question because I am seeing I haven't tried, but I plan to convert is this |
Seems like embedding bag forward only is a new op in core aten. |
To my knowledge, @dynamo_tensorrt_converter(
torch.ops.aten._embedding_bag_forward_only.default,
capability_validator=embedding_bag_validator,
supports_dynamic_shapes=True,
) TensorRT/py/torch_tensorrt/dynamo/conversion/aten_ops_converters.py Lines 290 to 324 in 1820713
|
Thanks for your suggestion @zewenli98 , I remember I have tried with this, and the code failed on some strange shape assertion in I have to comment out the assertion, and let it compile. However, the compilation failed due to some other reasons. So I am switching to the traditional onnx way now. |
@sean-xiang-applovin Thanks for letting us know. It looks like the assertion you pointed out only checks their shapes, not types. If you have runnable code at hand, could you try passing in None to Besides, I'm wondering if you passed in 1-dim |
thanks @zewenli98 , I will try to give a mini repo as soon as possible |
Hi @zewenli98 , it took me some time to set up everything and reproduce the error. And since pytorch 2.5 is released recently, and also there is a version bump of torch-tensorrt. I basically re-setup everything from my end. And this time, there is some new error/issues comes up. I have created 3 notebooks to explain what I did and what I found, and what's the issue/bug, in this zip In embedding_bag_forward_only, I described that, only compiling on loaded exported program, will decomposition generate In embedding_bag_compile_slow, I described what I find, that, compiling a simple embedding bag layer takes long time to finish. There is a lot of net layers generated. which looks strange to me. This compilation time bothers me a lot, since my model has a lot of embedding layers. In embedding_bag_compile_result_mismatch, I describe a real bug or issue. When I compile with an embedding bag layer, the compiled result is very different from the original results. In this notebook, I compiled based on a loaded exported program Please let me know if you need more information. and I really appreciate your help, thank you. |
another embedding bag bug, issue @zewenli98 can you please help take a look, thank you |
@sean-xiang-applovin Thanks for the details. I'll take a look and get back to you soon. |
Hi @sean-xiang-applovin ,
DEBUG:torch_tensorrt.dynamo.lowering.passes.remove_detach:Removed 0 detach nodes:
graph():
%p_fn_weight : [num_users=1] = placeholder[target=p_fn_weight]
%input : [num_users=1] = placeholder[target=input]
%arange : [num_users=1] = call_function[target=torch.ops.aten.arange.start_step](args = (0, 30, 30), kwargs = {dtype: torch.int64, device: cpu, pin_memory: False})
%view : [num_users=1] = call_function[target=torch.ops.aten.view.default](args = (%input, [-1]), kwargs = {})
%embedding_bag : [num_users=1] = call_function[target=torch.ops.aten.embedding_bag.padding_idx](args = (%p_fn_weight, %view, %arange, False, 0, False, None, False, 0), kwargs = {})
%getitem : [num_users=1] = call_function[target=operator.getitem](args = (%embedding_bag, 0), kwargs = {})
return (getitem,)
DEBUG:torch_tensorrt.dynamo._compiler:Input graph: graph():
%weight : [num_users=1] = get_attr[target=weight]
%input_1 : [num_users=1] = placeholder[target=input]
%arange : [num_users=1] = call_function[target=torch.ops.aten.arange.start_step](args = (0, 30, 30), kwargs = {dtype: torch.int64, device: cpu, pin_memory: False})
%view : [num_users=1] = call_function[target=torch.ops.aten.view.default](args = (%input_1, [-1]), kwargs = {})
%embedding_bag : [num_users=1] = call_function[target=torch.ops.aten.embedding_bag.padding_idx](args = (%weight, %view, %arange, False, 0, False, None, False, 0), kwargs = {})
%getitem : [num_users=1] = call_function[target=operator.getitem](args = (%embedding_bag, 0), kwargs = {})
return (getitem,)
@dynamo_tensorrt_converter(
torch.ops.aten.embedding_bag.padding_idx,
capability_validator=embedding_bag_validator,
supports_dynamic_shapes=True,
) output: INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT INetwork construction elapsed time: 0:00:00.080986
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Not found cached TRT engines. Start building engine.
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:Build TRT engine elapsed time: 0:00:41.396221
INFO:torch_tensorrt.dynamo.conversion._TRTInterpreter:TRT Engine uses: 11947508 bytes of Memory
DEBUG: [Torch-TensorRT - Debug Build] - Deserializing Device Info: 0%8%9%0%NVIDIA GeForce RTX 4080 Besides, you can also try pass in
The outputs are the same. So, I'm thinking maybe the issue is not related to dynamo_compile but your ep? |
@zewenli98 thanks for debugging.
|
@sean-xiang-applovin Sorry for the unclearness. I found the reason is that I was using nightly pytorch 2.6.0, in which For embedding bag converter, I just noticed that you're using 2d input which is not supported yet for some reasons. Besides, compilation time for embedding bag is kind of slow may be because we are using TensorRT's ILoopLayer which causes additional overhead for data-dependent issue. In order to totally solve the issue, we have to do much additional work on it. So if you are really relying on the 2d-input embedding bag, I would recommend seeking other paths or forcing this op to fallback to pytorch by something like |
Hi @zewenli98 thanks for the response, I can try with nightly pytorch.
You mentioned 2d input is not supported. I am kinda confused. My input shape is (x, y), where x is the batch size. I would be very surprised that embedding bad module cannot support this. |
@sean-xiang-applovin Can you try something like |
❓ Question
Since only part of the ops support dynamic shapes, and some are not. What's the criteria to decide if an op supports dynamic shape or not?
For some existing ops, which are not marked as
supports_dynamic_shapes=True
, can I write a converter that wraps the existing converter, and mark my own converter with high priority? Is this the recommended way?or should I just turn on
assume_dynamic_shape_support
, which seems to be a flag globally for all converters ?What you have already tried
Environment
conda
,pip
,libtorch
, source): pipAdditional context
The text was updated successfully, but these errors were encountered: