-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve TensorRT GetCapability to Enable More Models #1012
Conversation
seems like windows tensorrt build fails |
if GetCapability() is now reliable, doesn't this mean we can re-enable many of the TRT disabled unit tests? |
Yes. I think most of the disabled unit tests could be removed. But if we keep those tests disabled people know what tests can't run on TensorRT, otherwise we just lose the tracking since those tests just fall back to other execution providers. |
We do not want to leave tests disabled. It was only done as a temporary workaround until GetCapability is fixed.
|
} | ||
|
||
|
||
std::unique_ptr<IndexedSubGraph> GetSubGraph(SubGraph_t graph_nodes_index, int& kernels_index, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be good to have explicit unit tests of these new methods (GetSubGraph, GetSupportedList)
std::unique_ptr<IndexedSubGraph> GetSubGraph(SubGraph_t graph_nodes_index, int& kernels_index, | ||
const onnxruntime::GraphViewer& graph) const; | ||
|
||
SubGraphCollection_t GetSupportedList(SubGraphCollection_t supported_nodes_list, int iterations, const int& max_iterations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is max_iterations a reference?
onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h
Outdated
Show resolved
Hide resolved
|
||
CHECK_CUDA(cudaMemcpy(output_tensors[output_index].data, buffers[i + num_binding_inputs], batch_size * output_dim_sizes[i] * sizeof(float), cudaMemcpyDeviceToHost)); | ||
int output_size = batch_size * output_dim_sizes[i]; | ||
if (output_types[i] == TensorProto::FLOAT) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add comments for what we're doing here.
onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
Outdated
Show resolved
Hide resolved
I see TRTProvider exclusions were removed in gemm_test.cc |
@@ -173,7 +174,7 @@ TEST(UpsampleOpTest, UpsampleOpNearest222XTest) { | |||
}; | |||
|
|||
test.AddOutput<float>("Y", {N*2, C, (int64_t)(H * scales[2]), (int64_t)(W * scales[3])}, Y); | |||
test.Run(OpTester::ExpectResult::kExpectSuccess, "", {kTensorrtExecutionProvider});//TensorRT parser: Assertion failed: scales[0] == 1 && scales[1] == 1 | |||
test.Run();//TensorRT parser: Assertion failed: scales[0] == 1 && scales[1] == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these assertion failure comments be removed if we are re-enabling the TensorrtExecutionProvider for these tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment for other tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the comments
onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
Outdated
Show resolved
Hide resolved
OrtValue* output_tensor = ort.KernelContext_GetOutput(context, output_index, output_shapes[i].data(), output_shapes[i].size()); | ||
if (output_types[i] == TensorProto::FLOAT) { | ||
CHECK_CUDA(cudaMemcpy(ort.GetTensorMutableData<float>(output_tensor), buffers[i + num_binding_inputs], batch_size * output_dim_sizes[i] * sizeof(float), cudaMemcpyDeviceToHost)); | ||
// If output tensor type is INT64, TensorRT processes data as INT32 and the output will be converted to INT64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move the comment to where the applicable code is.
onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc
Outdated
Show resolved
Hide resolved
ort.ReleaseTensorTypeAndShapeInfo(tensor_info); | ||
const float* input = ort.GetTensorData<float>(input_tensor); | ||
|
||
const int input_batch_size = tensor_shape[0]; | ||
if (i > 0 && batch_size != input_batch_size) { | ||
ORT_THROW("Input batch size is inconsistent"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought we should avoid throwing exception in compute function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see other places in Compile() function where it uses ORT_ENFORCE instead of returning a Status
please revisit.
CHECK_CUDA(cudaMalloc(&buffers[i], input_size * sizeof(int32_t))); | ||
CHECK_CUDA(cudaMemcpy(buffers[i], input, input_size * sizeof(int32_t), cudaMemcpyHostToDevice)); | ||
} else { | ||
Status(common::ONNXRUNTIME, common::FAIL, "Input tensor type " + std::to_string(tensor_type) + " is not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why allocate a Status() , doesn't seem used anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment for other Status() allocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this compute_func() always returns 0, which doens't seem correct.
if there are errors, we need to return 1.
} else if (output_types[i] == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32 || output_types[i] == ONNX_TENSOR_ELEMENT_DATA_TYPE_INT64) { | ||
CHECK_CUDA(cudaMalloc(&buffers[i + num_binding_inputs], batch_size * output_dim_sizes[i] * sizeof(int32_t))); | ||
} else { | ||
Status(common::ONNXRUNTIME, common::FAIL, "Output tensor type " + std::to_string(output_types[i]) + " is not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above for Status()
for (int j = 0; j < output_size; ++j) { | ||
ort.GetTensorMutableData<int64_t>(output_tensor)[j] = output[j]; | ||
} | ||
delete[] output; | ||
} else { | ||
Status(common::ONNXRUNTIME, common::FAIL, "Output type is not supported by TensorRT"); | ||
Status(common::ONNXRUNTIME, common::FAIL, "Output tensor type " + std::to_string(output_types[i]) + " is not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same Status() comment
} | ||
delete[] output; | ||
} else { | ||
return 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to return specific status code instead of 1 (same for other places)
https://github.com/microsoft/onnxruntime/blob/master/include/onnxruntime/core/common/status.h#L33
looks like you'll need to merge master to pick up #1097 |
No description provided.