Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在nvidia Jetson TX2部署, 编译paddleocr c++ 预测demo --use_tensorrt=true的问题 #9981

Closed
WYQ-Github opened this issue May 19, 2023 · 2 comments
Assignees

Comments

@WYQ-Github
Copy link

WYQ-Github commented May 19, 2023

完整报错如下
./build/ppocr --limit_side_len=960 --visualize=true --precision=fp32 --gpu_mem=400 --use_tensorrt=true --visualize=true
In PP-OCRv3, rec_image_shape parameter defaults to '3, 48, 320',if you are using recognition model with PP-OCRv2 or an older version, please set --rec_image_shape='3,32,320
total images num: 50
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0601 16:09:40.643213 13695 analysis_predictor.cc:881] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [adaptive_pool2d_convert_global_pass]
I0601 16:09:40.696204 13695 fuse_pass_base.cc:57] --- detected 8 subgraphs
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
--- Running IR pass [delete_quant_dequant_filter_op_pass]
--- Running IR pass [delete_weight_dequant_linear_op_pass]
--- Running IR pass [delete_quant_dequant_linear_op_pass]
--- Running IR pass [add_support_int8_pass]
I0601 16:09:41.050448 13695 fuse_pass_base.cc:57] --- detected 237 subgraphs
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [skip_layernorm_fuse_pass]
--- Running IR pass [preln_skip_layernorm_fuse_pass]
--- Running IR pass [conv_bn_fuse_pass]
I0601 16:09:41.226161 13695 fuse_pass_base.cc:57] --- detected 33 subgraphs
--- Running IR pass [unsqueeze2_eltwise_fuse_pass]
--- Running IR pass [trt_squeeze2_matmul_fuse_pass]
--- Running IR pass [trt_reshape2_matmul_fuse_pass]
--- Running IR pass [trt_flatten2_matmul_fuse_pass]
--- Running IR pass [trt_map_matmul_v2_to_mul_pass]
--- Running IR pass [trt_map_matmul_v2_to_matmul_pass]
--- Running IR pass [trt_map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
I0601 16:09:41.302793 13695 fuse_pass_base.cc:57] --- detected 49 subgraphs
--- Running IR pass [tensorrt_subgraph_pass]
I0601 16:09:41.380200 13695 tensorrt_subgraph_pass.cc:145] --- detect a sub-graph with 187 nodes
I0601 16:09:41.415360 13695 tensorrt_subgraph_pass.cc:433] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
I0601 16:09:43.589931 13695 engine.cc:222] Run Paddle-TRT Dynamic Shape mode.
W0601 16:09:43.592581 13695 helper.h:107] DLA requests all profiles have same min, max, and opt value. All dla layers are falling back to GPU
I0601 16:12:27.171494 13695 engine.cc:462] ====== engine info ======
I0601 16:12:27.190670 13695 engine.cc:467] Layers:
conv2d (Output: batch_norm_0.tmp_315)
PWN(PWN((Unnamed Layer* 1) [Activation]), hard_swish (Output: hardswish_0.tmp_017))
conv2d (Output: batch_norm_1.tmp_330) + relu (Output: relu_0.tmp_032)
conv2d (Output: depthwise_conv2d_0.tmp_035) + batchnorm_add_scale (Output: batch_norm_2.tmp_345) + relu (Output: relu_1.tmp_047)
conv2d (Output: batch_norm_3.tmp_360) + elementwise (Output: elementwise_add_062)
conv2d (Output: batch_norm_4.tmp_375) + relu (Output: relu_2.tmp_077)
conv2d (Output: depthwise_conv2d_1.tmp_080) + batchnorm_add_scale (Output: batch_norm_5.tmp_390) + relu (Output: relu_3.tmp_092)
conv2d (Output: batch_norm_6.tmp_3105)
conv2d (Output: batch_norm_7.tmp_3118) + relu (Output: relu_4.tmp_0120)
conv2d (Output: depthwise_conv2d_2.tmp_0123) + batchnorm_add_scale (Output: batch_norm_8.tmp_3133) + relu (Output: relu_5.tmp_0135)
conv2d (Output: batch_norm_9.tmp_3148) + elementwise (Output: elementwise_add_1150)
conv2d (Output: batch_norm_10.tmp_3163) + relu (Output: relu_6.tmp_0165)
conv2d (Output: conv2d_114.tmp_0775)
pool2d (Output: pool2d_3.tmp_0777)
conv2d (Output: depthwise_conv2d_3.tmp_0168) + batchnorm_add_scale (Output: batch_norm_11.tmp_3178) + relu (Output: relu_7.tmp_0180)
conv2d (Output: conv2d_115.tmp_1783) + relu (Output: relu_15.tmp_0785)
conv2d (Output: conv2d_116.tmp_1791)
conv2d (Output: batch_norm_12.tmp_3193)
conv2d (Output: batch_norm_13.tmp_3206) + relu (Output: relu_8.tmp_0208)
conv2d (Output: depthwise_conv2d_4.tmp_0211) + batchnorm_add_scale (Output: batch_norm_14.tmp_3221) + relu (Output: relu_9.tmp_0223)
conv2d (Output: batch_norm_15.tmp_3236) + elementwise (Output: elementwise_add_2238)
conv2d (Output: batch_norm_16.tmp_3251) + relu (Output: relu_10.tmp_0253)
conv2d (Output: depthwise_conv2d_5.tmp_0256) + batchnorm_add_scale (Output: batch_norm_17.tmp_3266) + relu (Output: relu_11.tmp_0268)
conv2d (Output: batch_norm_18.tmp_3281) + elementwise (Output: elementwise_add_3283)
conv2d (Output: batch_norm_19.tmp_3296)
conv2d (Output: conv2d_111.tmp_0750)
pool2d (Output: pool2d_2.tmp_0752)
PWN(PWN((Unnamed Layer* 53) [Activation]), hard_swish (Output: hardswish_1.tmp_0298))
conv2d (Output: conv2d_112.tmp_1758) + relu (Output: relu_14.tmp_0760)
conv2d (Output: depthwise_conv2d_6.tmp_0301) + batchnorm_add_scale (Output: batch_norm_20.tmp_3311)
conv2d (Output: conv2d_113.tmp_1766)
PWN(PWN((Unnamed Layer* 60) [Activation]), hard_swish (Output: hardswish_2.tmp_0313))
conv2d (Output: batch_norm_21.tmp_3326)
conv2d (Output: batch_norm_22.tmp_3339)
PWN(PWN((Unnamed Layer* 67) [Activation]), hard_swish (Output: hardswish_3.tmp_0341))
conv2d (Output: depthwise_conv2d_7.tmp_0344) + batchnorm_add_scale (Output: batch_norm_23.tmp_3354)
PWN(PWN((Unnamed Layer* 72) [Activation]), hard_swish (Output: hardswish_4.tmp_0356))
conv2d (Output: batch_norm_24.tmp_3369) + elementwise (Output: elementwise_add_4371)
conv2d (Output: batch_norm_25.tmp_3384)
PWN(PWN((Unnamed Layer* 77) [Activation]), hard_swish (Output: hardswish_5.tmp_0386))
conv2d (Output: depthwise_conv2d_8.tmp_0389) + batchnorm_add_scale (Output: batch_norm_26.tmp_3399)
PWN(PWN((Unnamed Layer* 81) [Activation]), hard_swish (Output: hardswish_6.tmp_0401))
conv2d (Output: batch_norm_27.tmp_3414) + elementwise (Output: elementwise_add_5416)
conv2d (Output: batch_norm_28.tmp_3429)
PWN(PWN((Unnamed Layer* 86) [Activation]), hard_swish (Output: hardswish_7.tmp_0431))
conv2d (Output: depthwise_conv2d_9.tmp_0434) + batchnorm_add_scale (Output: batch_norm_29.tmp_3444)
PWN(PWN((Unnamed Layer* 90) [Activation]), hard_swish (Output: hardswish_8.tmp_0446))
conv2d (Output: batch_norm_30.tmp_3459) + elementwise (Output: elementwise_add_6461)
conv2d (Output: batch_norm_31.tmp_3474)
PWN(PWN((Unnamed Layer* 95) [Activation]), hard_swish (Output: hardswish_9.tmp_0476))
conv2d (Output: depthwise_conv2d_10.tmp_0479) + batchnorm_add_scale (Output: batch_norm_32.tmp_3489)
PWN(PWN((Unnamed Layer* 99) [Activation]), hard_swish (Output: hardswish_10.tmp_0491))
conv2d (Output: batch_norm_33.tmp_3504)
conv2d (Output: batch_norm_34.tmp_3517)
PWN(PWN((Unnamed Layer* 103) [Activation]), hard_swish (Output: hardswish_11.tmp_0519))
conv2d (Output: depthwise_conv2d_11.tmp_0522) + batchnorm_add_scale (Output: batch_norm_35.tmp_3532)
PWN(PWN((Unnamed Layer* 107) [Activation]), hard_swish (Output: hardswish_12.tmp_0534))
conv2d (Output: batch_norm_36.tmp_3547) + elementwise (Output: elementwise_add_7549)
conv2d (Output: batch_norm_37.tmp_3562)
conv2d (Output: conv2d_108.tmp_0725)
pool2d (Output: pool2d_1.tmp_0727)
PWN(PWN((Unnamed Layer* 113) [Activation]), hard_swish (Output: hardswish_13.tmp_0564))
conv2d (Output: conv2d_109.tmp_1733) + relu (Output: relu_13.tmp_0735)
conv2d (Output: depthwise_conv2d_12.tmp_0567) + batchnorm_add_scale (Output: batch_norm_38.tmp_3577)
conv2d (Output: conv2d_110.tmp_1741)
PWN(PWN((Unnamed Layer* 120) [Activation]), hard_swish (Output: hardswish_14.tmp_0579))
conv2d (Output: batch_norm_39.tmp_3592)
conv2d (Output: batch_norm_40.tmp_3605)
PWN(PWN((Unnamed Layer* 127) [Activation]), hard_swish (Output: hardswish_15.tmp_0607))
conv2d (Output: depthwise_conv2d_13.tmp_0610) + batchnorm_add_scale (Output: batch_norm_41.tmp_3620)
PWN(PWN((Unnamed Layer* 132) [Activation]), hard_swish (Output: hardswish_16.tmp_0622))
conv2d (Output: batch_norm_42.tmp_3635) + elementwise (Output: elementwise_add_8637)
conv2d (Output: batch_norm_43.tmp_3650)
PWN(PWN((Unnamed Layer* 137) [Activation]), hard_swish (Output: hardswish_17.tmp_0652))
conv2d (Output: depthwise_conv2d_14.tmp_0655) + batchnorm_add_scale (Output: batch_norm_44.tmp_3665)
PWN(PWN((Unnamed Layer* 141) [Activation]), hard_swish (Output: hardswish_18.tmp_0667))
conv2d (Output: batch_norm_45.tmp_3680) + elementwise (Output: elementwise_add_9682)
conv2d (Output: batch_norm_46.tmp_3695)
PWN(PWN((Unnamed Layer* 146) [Activation]), hard_swish (Output: hardswish_19.tmp_0697))
conv2d (Output: conv2d_105.tmp_0700)
pool2d (Output: pool2d_0.tmp_0702)
conv2d (Output: conv2d_106.tmp_1708) + relu (Output: relu_12.tmp_0710)
conv2d (Output: conv2d_107.tmp_1716)
PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_0.tmp_0718)), elementwise (Output: tmp_0720)), elementwise (Output: tmp_1722))
nearest_interp_v2 (Output: nearest_interp_v2_0.tmp_0799)
conv2d (Output: conv2d_117.tmp_0812)
PWN(PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_1.tmp_0743)), elementwise (Output: tmp_2745)), elementwise (Output: tmp_3747)), elementwise (Output: tmp_8801))
pool2d (Output: pool2d_4.tmp_0814)
nearest_interp_v2 (Output: nearest_interp_v2_1.tmp_0803)
conv2d (Output: conv2d_120.tmp_0837)
conv2d (Output: conv2d_118.tmp_1820) + relu (Output: relu_16.tmp_0822)
PWN(PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_2.tmp_0768)), elementwise (Output: tmp_4770)), elementwise (Output: tmp_5772)), elementwise (Output: tmp_9805))
pool2d (Output: pool2d_5.tmp_0839)
nearest_interp_v2 (Output: nearest_interp_v2_2.tmp_0807)
conv2d (Output: conv2d_123.tmp_0862)
conv2d (Output: conv2d_121.tmp_1845) + relu (Output: relu_17.tmp_0847)
conv2d (Output: conv2d_119.tmp_1828)
PWN(PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_3.tmp_0793)), elementwise (Output: tmp_6795)), elementwise (Output: tmp_7797)), elementwise (Output: tmp_10809))
pool2d (Output: pool2d_6.tmp_0864)
conv2d (Output: conv2d_126.tmp_0887)
conv2d (Output: conv2d_124.tmp_1870) + relu (Output: relu_18.tmp_0872)
conv2d (Output: conv2d_122.tmp_1853)
pool2d (Output: pool2d_7.tmp_0889)
PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_4.tmp_0830)), elementwise (Output: tmp_11832)), elementwise (Output: tmp_12834))
conv2d (Output: conv2d_127.tmp_1895) + relu (Output: relu_19.tmp_0897)
conv2d (Output: conv2d_125.tmp_1878)
nearest_interp_v2 (Output: nearest_interp_v2_3.tmp_0911)
PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_5.tmp_0855)), elementwise (Output: tmp_13857)), elementwise (Output: tmp_14859))
conv2d (Output: conv2d_128.tmp_1903)
nearest_interp_v2 (Output: nearest_interp_v2_4.tmp_0913)
PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_6.tmp_0880)), elementwise (Output: tmp_15882)), elementwise (Output: tmp_16884))
nearest_interp_v2 (Output: nearest_interp_v2_5.tmp_0915)
PWN(PWN(PWN(hard_sigmoid (Output: hardsigmoid_7.tmp_0905)), elementwise (Output: tmp_17907)), elementwise (Output: tmp_18909))
nearest_interp_v2_3.tmp_0911 copy
nearest_interp_v2_4.tmp_0913 copy
nearest_interp_v2_5.tmp_0915 copy
conv2d (Output: batch_norm_47.tmp_3930) + relu (Output: batch_norm_47.tmp_4932)
conv2d_transpose (Output: conv2d_transpose_4.tmp_0935) + (Unnamed Layer* 201) [Constant] + (Unnamed Layer* 207) [Shuffle] + elementwise (Output: elementwise_add_10.tmp_0938) + batchnorm_add_scale (Output: batch_norm_48.tmp_3948) + relu (Output: batch_norm_48.tmp_4950)
conv2d_transpose (Output: conv2d_transpose_5.tmp_0953) + (Unnamed Layer* 212) [Constant] + (Unnamed Layer* 218) [Shuffle] + elementwise (Output: elementwise_add_11.tmp_0956)
PWN(sigmoid (Output: sigmoid_0.tmp_0958))

Bindings:
x
sigmoid_0.tmp_0958
I0601 16:12:27.191135 13695 engine.cc:469] ====== engine info end ======
--- Running IR pass [conv_bn_fuse_pass]
--- Running IR pass [conv_elementwise_add_act_fuse_pass]
--- Running IR pass [conv_elementwise_add2_act_fuse_pass]
--- Running IR pass [transpose_flatten_concat_fuse_pass]
--- Running analysis [ir_params_sync_among_devices_pass]
I0601 16:12:27.238207 13695 ir_params_sync_among_devices_pass.cc:100] Sync params from CPU to GPU
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
what():


C++ Traceback (most recent call last):

0 paddle_infer::CreatePredictor(paddle::AnalysisConfig const&)
1 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&)
2 std::unique_ptr<paddle::PaddlePredictor, std::default_deletepaddle::PaddlePredictor > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&)
3 paddle::AnalysisPredictor::Init(std::shared_ptrpaddle::framework::Scope const&, std::shared_ptrpaddle::framework::ProgramDesc const&)
4 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptrpaddle::framework::ProgramDesc const&)
5 paddle::AnalysisPredictor::OptimizeInferenceProgram()
6 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument*)
7 paddle::inference::analysis::IrParamsSyncAmongDevicesPass::RunImpl(paddle::inference::analysis::Argument*)
8 paddle::inference::analysis::IrParamsSyncAmongDevicesPass::CopyParamsToGpu(paddle::inference::analysis::Argument*)
9 paddle::framework::TensorCopySync(phi::DenseTensor const&, phi::Place const&, phi::DenseTensor*)
10 phi::DenseTensor::mutable_data(phi::Place const&, paddle::experimental::DataType, unsigned long)
11 paddle::memory::AllocShared(phi::Place const&, unsigned long)
12 paddle::memory::allocation::AllocatorFacade::AllocShared(phi::Place const&, unsigned long)
13 paddle::memory::allocation::AllocatorFacade::Alloc(phi::Place const&, unsigned long)
14 paddle::memory::allocation::StatAllocator::AllocateImpl(unsigned long)
15 paddle::memory::allocation::Allocator::Allocate(unsigned long)
16 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)
17 paddle::memory::allocation::NaiveBestFitAllocator::AllocateImpl(unsigned long)
18 paddle::memory::legacy::AllocVisitor::result_type paddle::platform::VisitPlacepaddle::memory::legacy::AllocVisitor(phi::Place const&, paddle::memory::legacy::AllocVisitor const&)
19 void* paddle::memory::legacy::Allocphi::GPUPlace(phi::GPUPlace const&, unsigned long)
20 paddle::memory::legacy::GetGPUBuddyAllocator(int)
21 paddle::memory::legacy::GPUBuddyAllocatorList::Get(int)
22 paddle::memory::legacy::GPUBuddyAllocatorList::Get(int)::{lambda()#1}::operator()() const
23 paddle::platform::GpuMaxChunkSize()
24 paddle::platform::GpuMaxAllocSize()
25 phi::enforce::EnforceNotMet::EnforceNotMet(phi::ErrorSummary const&, char const*, int)
26 phi::enforce::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

ResourceExhaustedError: Not enough available GPU memory.
[Hint: Expected available_to_alloc >= alloc_bytes, but received available_to_alloc:318272921 < alloc_bytes:419430400.] (at /home/paddle/data/xly/workspace/23303/Paddle/paddle/fluid/platform/device/gpu/gpu_info.cc:99)

Aborted (core dumped)

@WYQ-Github WYQ-Github changed the title 在nvidia Jetson TX2部署, 编译paddleocr c++ 预测demo的问题 在nvidia Jetson TX2部署, 编译paddleocr c++ 预测demo --use_tensorrt=true的问题 Jun 1, 2023
@WYQ-Github
Copy link
Author

@an1018 哥哥能帮忙看一下吗

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants