[NoMergeLatest](exit) call stop before brpc server stop to stop queries and allow brpc exist gracefully#54781
Closed
yiguolei wants to merge 1 commit intoapache:branch-3.1from
Closed
[NoMergeLatest](exit) call stop before brpc server stop to stop queries and allow brpc exist gracefully#54781yiguolei wants to merge 1 commit intoapache:branch-3.1from
yiguolei wants to merge 1 commit intoapache:branch-3.1from
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
TPC-H: Total hot run time: 32758 ms |
TPC-DS: Total hot run time: 190126 ms |
ClickBench: Total hot run time: 28.51 s |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
1 similar comment
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
1 similar comment
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
1 similar comment
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
4 similar comments
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
Contributor
Author
|
run p0 |
PipelineTask is also hold by task queue( apache#49753), so that it maybe the last one to be destructed. But pipeline task hold some objects, like operators, shared state, etc. So that should release memory manually. F20250908 20:07:41.329619 39575 mem_tracker_limiter.cpp:112] mem tracker label: Query#Id=ec8535b35ed34f54-afd752d5d1dd97c1, consumption: 16640, peak consumption: 16640, mem tracker not equal to 0 when mem tracker destruct, this usually means that memory tracking is inaccurate and SCOPED_ATTACH_TASK and SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER are not used correctly. If the log is truncated, search for `Address Sanitizer` in the be.INFO log to see more information.1. For query and load, memory leaks may have occurred, it is expected that the query mem tracker will be bound to the thread context using SCOPED_ATTACH_TASK and SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER before all memory alloc and free. 2. If a memory alloc is recorded by this tracker, it is expected that be recorded in this tracker when memory is freed. 3. Merge the remaining memory tracking value by this tracker into Orphan, if you observe that Orphan is not equal to 0 in the mem tracker web or log, this indicates that there may be a memory leak. 4. If you need to transfer memory tracking value between two trackers, can use transfer_to..[Address Sanitizer]: memory not be freed: [Address Sanitizer] buf not be freed, mem tracker label: Query#Id=ec8535b35ed34f54-afd752d5d1dd97c1, consumption: 16640, peak consumption: 16640, buf: 0x7d87c8761d00, size 4096, strack trace: 0# doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, false>::alloc(unsigned long, unsigned long) 1# void doris::vectorized::PODArrayBase<1ul, 4096ul, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, false>, 16ul, 15ul>::alloc<>(unsigned long) 2# void doris::vectorized::PODArray<signed char, 4096ul, doris::Allocator<false, false, false, doris::DefaultMemoryAllocator, false>, 16ul, 15ul>::push_back<long>(long&&) 3# doris::vectorized::ColumnVector<(doris::PrimitiveType)3>::insert(doris::vectorized::Field const&) 4# doris::vectorized::IDataType::create_column_const(unsigned long, doris::vectorized::Field const&) const 5# doris::vectorized::VLiteral::init(doris::TExprNode const&) 6# doris::vectorized::VLiteral::VLiteral(doris::TExprNode const&, bool) 7# std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<doris::vectorized::VLiteral, std::allocator<void>, doris::TExprNode const&>(doris::vectorized::VLiteral*&, std::_Sp_alloc_shared_tag<std::allocator<void> >, doris::TExprNode const&) 8# doris::vectorized::VExpr::create_expr(doris::TExprNode const&, std::shared_ptr<doris::vectorized::VExpr>&) 9# doris::vectorized::VExpr::create_tree_from_thrift(std::vector<doris::TExprNode, std::allocator<doris::TExprNode> > const&, int*, std::shared_ptr<doris::vectorized::VExpr>&, std::shared_ptr<doris::vectorized::VExprContext>&) 10# doris::vectorized::VExpr::create_expr_tree(doris::TExpr const&, std::shared_ptr<doris::vectorized::VExprContext>&) 11# doris::pipeline::OperatorXBase::init(doris::TPlanNode const&, doris::RuntimeState*) 12# doris::pipeline::ScanOperatorX<doris::pipeline::OlapScanLocalState>::init(doris::TPlanNode const&, doris::RuntimeState*) 13# doris::pipeline::PipelineFragmentContext::_create_tree_helper(doris::ObjectPool*, std::vector<doris::TPlanNode, std::allocator<doris::TPlanNode> > const&, doris::TPipelineFragmentParams const&, doris::DescriptorTbl const&, std::shared_ptr<doris::pipeline::OperatorXBase>, int*, std::shared_ptr<doris::pipeline::OperatorXBase>*, std::shared_ptr<doris::pipeline::Pipeline>&, int, bool) 14# doris::pipeline::PipelineFragmentContext::_build_pipelines(doris::ObjectPool*, doris::TPipelineFragmentParams const&, doris::DescriptorTbl const&, std::shared_ptr<doris::pipeline::OperatorXBase>*, std::shared_ptr<doris::pipeline::Pipeline>) 15# doris::pipeline::PipelineFragmentContext::prepare(doris::TPipelineFragmentParams const&, doris::ThreadPool*) 16# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource, std::function<void (doris::RuntimeState*, doris::Status*)> const&, doris::TPipelineFragmentParamsList const&) 17# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, doris::QuerySource, doris::TPipelineFragmentParamsList const&) 18# doris::PInternalService::_exec_plan_fragment_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, doris::PFragmentRequestVersion, bool, std::function<void (doris::RuntimeState*, doris::Status*)> const&) 19# doris::PInternalService::_exec_plan_fragment_in_pthread(google::protobuf::RpcController*, doris::PExecPlanFragmentRequest const*, doris::PExecPlanFragmentResult*, google::protobuf::Closure*) 20# doris::WorkThreadPool<false>::work_thread(int) 21# execute_native_thread_routine 22# asan_thread_start(void*) 23# ?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Thread 1 (Thread 0x7f2a03130040 (LWP 3064597) "doris_be"):
#0 futex_wait_cancelable (private=, expected=0, futex_word=0x612000433780) at ../sysdeps/nptl/futex-internal.h:183
#1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x612000433730, cond=0x612000433758) at pthread_cond_wait.c:508
#2 __pthread_cond_wait (cond=0x612000433758, mutex=0x612000433730) at pthread_cond_wait.c:647
#3 0x000055e4848e6a68 in brpc::Acceptor::Join() ()
#4 0x000055e4848d2cdd in brpc::Server::Join() ()
#5 0x000055e44aeec3d8 in doris::BRpcService::join (this=) at /root/doris/be/src/service/brpc_service.cpp:107
#6 0x000055e44aeec155 in doris::BRpcService::~BRpcService (this=0x612000433780) at /root/doris/be/src/service/brpc_service.cpp:59
#7 0x000055e446772f04 in std::default_deletedoris::BRpcService::operator() (this=, __ptr=0x6020005e64d0) at /var/local/ldb-toolchain-018/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85
#8 std::__uniq_ptr_impl<doris::BRpcService, std::default_deletedoris::BRpcService >::reset (this=this@entry=0x7f2a0120f0c0, __p=0x80, __p@entry=0x0) at /var/local/ldb-toolchain-018/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:182
#9 0x000055e446747225 in std::unique_ptr<doris::BRpcService, std::default_deletedoris::BRpcService >::reset (__p=0x0, this=) at /var/local/ldb-toolchain-018/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:456
#10 main (argc=, argv=) at /root/doris/be/src/service/doris_main.cpp:631
Detaching from program: /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be, process 3064597
[Inferior 1 (process 3064597) detached]
When call doris be to exit gracefully, doris main function will blocked at brpc server's join method. It will try to wait all brpc closures to stop. But after wait, we did not stop task schedulers and fragments, and some query will continue to run.
In this PR, I try to stop 3 core thread pools, it will stop all queries.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)