Closed
Description
Bug Description
When compiling GPT-2 with Dynamo compile, the following error is encountered:
[07/01/2023-00:07:12] [TRT] [E] 3: [executionContext.cpp::enqueueInternal::795] Error Code 3: API Usage Error (Parameter check failed at: runtime/api/executionContext.cpp::enqueueInternal::795, condition: bindings[x] || nullBindingOK
)
Additionally, this does not seem to cause failures in the Dynamo runtime (not caught by pass_through_build_failures
), and it appears both with and without the experimental runtime.
To Reproduce
model = GPT2Model.from_pretrained("gpt2").eval().cuda()
input_ids = torch.randint(0, 2, (1, 14), dtype=torch.int32).cuda()
attention_mask = torch.randint(0, 2, (1, 14), dtype=torch.int32).cuda()
traced = transformers_trace(model, input_names=["input_ids", "attention_mask"]).eval().cuda()
fx_trt_model = torch_tensorrt.compile(traced, ir="dynamo_compile", inputs=[input_ids, attention_mask], debug=True, pass_through_build_failures=True, min_block_size=10)
Expected behavior
The model should not encounter TRT errors while compiling.
Environment
- Torch-TensorRT Version (e.g. 1.0.0): 2844630
- PyTorch Version (e.g. 1.0):
2.1.0.dev20230620+cu118
- TensorRT Version:
8.6.1
Additional context
This error no longer appears when we apply the @fake_tensor_unsupported
flag to the backends, as was removed in #1955. It is unclear whether this is the direct cause of the bug, however.
Additionally, this bug seems to appear only sometimes, and not consistently.