Skip to content

🐛 [Bug] Changing input size would affect the TRT engine size, testing on BERT #3634

Open
@zewenli98

Description

@zewenli98

Bug Description

I'm testing BERT with commands python perf_run.py --backends=dynamo --inputs="(512, 128)@int32;(512, 128)@int32" ... and python perf_run.py --backends=dynamo --inputs="(256, 128)@int32;(256, 128)@int32" .... I only changed the inputs here and saved their TRT engines but the two engines have different sizes:

Using --inputs="(256, 128)@int32;(256, 128)@int32":
Number of layers: 1277
Number of inputs: 2
Number of outputs: 2
Input 0: input_ids, shape: (256, 128), dtype: DataType.INT32
Input 1: attention_mask, shape: (256, 128), dtype: DataType.INT32
Output 0: output0, shape: (256, 128, 768), dtype: DataType.FLOAT
Output 1: output1, shape: (256, 768), dtype: DataType.FLOAT
TRT Engine uses: 516.5105247497559 Mb of Memory

Using --inputs="(512, 128)@int32;(512, 128)@int32":
Number of layers: 1277
Number of inputs: 2
Number of outputs: 2
Input 0: input_ids, shape: (512, 128), dtype: DataType.INT32
Input 1: attention_mask, shape: (512, 128), dtype: DataType.INT32
Output 0: output0, shape: (512, 128, 768), dtype: DataType.FLOAT
Output 1: output1, shape: (512, 768), dtype: DataType.FLOAT
TRT Engine uses: 612.4900169372559 Mb of Memory

To Reproduce

Steps to reproduce the behavior:

  1. cd tools/perf
  2. python perf_run.py ...

Expected behavior

Torch-TensorRT Dynamo backend looks faster than Inductor, but a little bit slower than ONNX path. I was wondering the reason, so I pulled out the engines and directly ran them. When I check the engines, I found the engine size got larger and larger as inputs increase. However, the engine sizes exported from ONNX path keep the same.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions