Skip to content

Automated fix for refs/heads/revert-31948-revert-31819-tasks/xds-override-host-cookie #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: revert-31948-revert-31819-tasks/xds-override-host-cookie
Choose a base branch
from

Conversation

github-actions[bot]
Copy link

@github-actions github-actions bot commented Jan 9, 2023

PanCakes to the rescue!

We noticed that our 'sanity' test was going to fail, but we think we can fix that automatically, so we put together this PR to do just that!

If you'd like to opt-out of these PR's, add yourself to NO_AUTOFIX_USERS in .github/workflows/pr-auto-fix.yaml

cb-robot pushed a commit that referenced this pull request Sep 26, 2023
Found memory access error in frame_fuzzer_test. Located the root cause
in ExecCtx::Get(), where ExecCtx needs to be initialized before using
HPackParser:ParseInput().


Error logs:
MemorySanitizer:DEADLYSIGNAL
==2812845==ERROR: MemorySanitizer: SEGV on unknown address
0x000000000030 (pc 0x55869275574e bp 0x7fffd7d9fb50 sp 0x7fffd7d9fb20
T2812845)
==2812845==The signal is caused by a READ memory access.
==2812845==Hint: address points to the zero page.
#0 0x55869275574e in starting_cpu
[third_party/grpc/src/core/lib/iomgr/exec_ctx.h:129](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/lib/iomgr/exec_ctx.h?l=129&ws=ladynana/2900&snapshot=42):9
#1 0x55869275574e in
grpc_core::PerCpu<grpc_core::GlobalStatsCollector::Data>::this_cpu()
[third_party/grpc/src/core/lib/gprpp/per_cpu.h:38](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/lib/gprpp/per_cpu.h?l=38&ws=ladynana/2900&snapshot=42):48
#2 0x558692753cda in IncrementHttp2MetadataSize
[third_party/grpc/src/core/lib/debug/stats_data.h:265](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/lib/debug/stats_data.h?l=265&ws=ladynana/2900&snapshot=42):11
#3 0x558692753cda in
grpc_core::HPackParser::ParseInput(grpc_core::HPackParser::Input, bool)
[third_party/grpc/src/core/ext/transport/chttp2/transport/hpack_parser.cc:933](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/ext/transport/chttp2/transport/hpack_parser.cc?l=933&ws=ladynana/2900&snapshot=42):20


<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
cb-robot pushed a commit that referenced this pull request Sep 26, 2023
This adds pre-built library for aarch64 linux, will help improve the
install speed and avoid building environment issues at customer side.

@apolcyn @jtattermusch Can you help build and push the new rake compiler
image?
Will update the tag and hash after the image is available

Manually tested locally:
```
uname -a
Linux u20 5.15.49-linuxkit #1 SMP PREEMPT Tue Sep 13 07:51:32 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
```
```
time gem install /work/ruby/grpc/pkg/grpc-1.56.0.dev-aarch64-linux.gem
Successfully installed grpc-1.56.0.dev-aarch64-linux
Parsing documentation for grpc-1.56.0.dev-aarch64-linux
Installing ri documentation for grpc-1.56.0.dev-aarch64-linux
Done installing documentation for grpc after 0 seconds
1 gem installed

real	0m22.794s
user	0m17.268s
sys	0m5.156s
```
```
ruby greeter_server.rb &
[1] 319
ruby greeter_client.rb
"Greeting: Hello world"
```

Fixes:
grpc#31855
grpc#29489
cb-robot pushed a commit that referenced this pull request Sep 26, 2023
grpc 1.57.0 crashes win ruby and alpine due to no `strdup` in musl libc.
This diff replace `strdup` with `grp_strdup`

```
Thread 1 "ruby" received signal SIGSEGV, Segmentation fault.
0x00000000000a4596 in ?? ()
(gdb) bt
#0  0x00000000000a4596 in ?? ()
#1  0x00007ffff14e298c in grpc_rb_channel_create_in_process_add_args_hash_cb (key=<optimized out>, val=<optimized out>, args_obj=<optimized out>) at rb_channel_args.c:84
#2  0x00007ffff7c2b9ea in hash_ar_foreach_iter (error=0, argp=140737488344784, value=<optimized out>, key=<optimized out>) at hash.c:1341
```

fixes grpc#34044
closes grpc#27995
cb-robot pushed a commit that referenced this pull request Sep 26, 2023
…2491)

This fixes a very strange TSAN flake seen in an internal test
(b/268292646), which seems to have been introduced in
grpc#32326.

I've included an example of the TSAN failure below, for future
reference. I don't fully understand what is causing this failure. It
looks like the async callback from a previous picker update caused the
pick to be re-queued but then got stuck draining queued callbacks in its
`ExecCtx` instance, while the async callback from the next picker update
had already failed the call. But the part I don't understand is why the
call combiner cancellation callback wound up scheduled on the `ExecCtx`
instance from the first thread -- that seems likely to be caused by
either the `WorkSerializer` or maybe `ExecCtx` work-stealing, but the
exact details are likely to be hard to nail down.

Switching from `EventEngine::Run()` back to `ExecCtx::Run()` seems to
fix the flake, so that's what I'm doing in this PR. I find this
work-around deeply dissatisfying, especially since I do not fully
understand the root cause of the problem. However, given that we are
working toward eliminating the `FilterBasedLoadBalancedCall` code
entirely as part of the promise conversion, this problem is probably not
worth further investigation.

```
WARNING: ThreadSanitizer: data race (pid=7406)
  Read of size 8 at 0x7b8800037558 by thread T47:
    #0 grpc_core::ClientChannel::FilterBasedLoadBalancedCall::~FilterBasedLoadBalancedCall() third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:2780:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x530ed) (BuildId: a8369742d3c435f7f1a244a535f4907c)
    #1 Delete third_party/grpc/src/core/lib/gprpp/ref_counted.h:248:31 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x64a73) (BuildId: a8369742d3c435f7f1a244a535f4907c)
    #2 Unref third_party/grpc/src/core/lib/gprpp/orphanable.h:102:7 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x64a73)
    #3 ~RefCountedPtr third_party/grpc/src/core/lib/gprpp/ref_counted_ptr.h:103:36 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x64a73)
    #4 ~LbQueuedCallCanceller third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3132:51 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x64a73)
    #5 grpc_core::ClientChannel::FilterBasedLoadBalancedCall::LbQueuedCallCanceller::CancelLocked(void*, absl::Status) third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3168:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x64a73)
    #6 exec_ctx_run third_party/grpc/src/core/lib/iomgr/exec_ctx.cc:45:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibexec_Uctx.so+0x4199) (BuildId: cafb035b95f6c781e93706a71dce95f3)
    #7 grpc_core::ExecCtx::Flush() third_party/grpc/src/core/lib/iomgr/exec_ctx.cc:72:9 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibexec_Uctx.so+0x4199)
    #8 ~ExecCtx third_party/grpc/src/core/lib/iomgr/exec_ctx.h:117:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c095) (BuildId: a8369742d3c435f7f1a244a535f4907c)
    #9 operator() third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3201:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c095)
    #10 __invoke<(lambda at third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3196:46) &> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c095)
    #11 invoke<(lambda at third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3196:46) &> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c095)
    #12 InvokeR<void, (lambda at third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3196:46) &, void> third_party/absl/functional/internal/any_invocable.h:131:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c095)
    #13 void absl::internal_any_invocable::LocalInvoker<false, void, grpc_core::ClientChannel::FilterBasedLoadBalancedCall::RetryPickLocked()::$_0&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:301:10 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c095)
    #14 operator() third_party/absl/functional/internal/any_invocable.h:855:1 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibevent_Uengine_Uem_Uimpl.so+0x16a74) (BuildId: 7802a789e61dddac5a33230a2f294046)
    #15 util::functional::internal::FunctorCallback<Closure, false, absl::AnyInvocable<void ()>, void ()>::Run() util/functional/to_callback_internal.h:87:27 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibevent_Uengine_Uem_Uimpl.so+0x16a74)
    #16 void eventmanager::EventManager::RunTask<false>(eventmanager::EventManager::WorkerThread*, eventmanager::TaskInfo*) net/eventmanager/em1/eventmanager.cc:2464:20 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x2ae1b) (BuildId: b3d97518d96e160722658c4aa0795013)
    #17 void eventmanager::EventManager::RunWorkerLoop<false, false>(eventmanager::EventManager::WorkerThread*) net/eventmanager/em1/eventmanager.cc (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x2a45d) (BuildId: b3d97518d96e160722658c4aa0795013)
    #18 eventmanager::EventManager::WorkerThread::WorkerMain() net/eventmanager/em1/eventmanager.cc:430:19 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x1d841) (BuildId: b3d97518d96e160722658c4aa0795013)
    #19 operator() net/eventmanager/em1/eventmanager.cc:396:64 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351) (BuildId: b3d97518d96e160722658c4aa0795013)
    #20 __invoke<(lambda at net/eventmanager/em1/eventmanager.cc:396:55)> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #21 invoke<(lambda at net/eventmanager/em1/eventmanager.cc:396:55)> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #22 InvokeR<void, (lambda at net/eventmanager/em1/eventmanager.cc:396:55), void> third_party/absl/functional/internal/any_invocable.h:131:3 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #23 void absl::internal_any_invocable::LocalInvoker<false, void, eventmanager::EventManager::WorkerThread::StartThread(int, int, thread::SchedPolicy, std::__tsan::basic_string_view<char, std::__tsan::char_traits<char>>)::'lambda'()&&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:301:10 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #24 operator() third_party/absl/functional/internal/any_invocable.h:863:1 (libST-a5842b3303c3_security_Sganpati_Ssharded_Uclient_Slibacl_Ucache.so+0x24c1e) (BuildId: a86eb59aee0be7b5882a1b532922e3ca)
    #25 ClosureThread::Run() thread/thread.h:459:25 (libST-a5842b3303c3_security_Sganpati_Ssharded_Uclient_Slibacl_Ucache.so+0x24c1e)
    #26 Thread::ThreadBody(void*) thread/thread.cc:1284:16 (libST-a5842b3303c3_thread_Slibthread.so+0x251e9) (BuildId: a1762ef5c5732b3ad8a3b3d798292e27)

  Previous write of size 8 at 0x7b8800037558 by thread T40:
    #0 grpc_core::ClientChannel::FilterBasedLoadBalancedCall::PendingBatchesFail(absl::Status, bool (*)(grpc_core::CallCombinerClosureList const&)) third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:2863:13 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x537c7) (BuildId: a8369742d3c435f7f1a244a535f4907c)
    #1 grpc_core::ClientChannel::FilterBasedLoadBalancedCall::TryPick(bool) third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3179:7 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x5537f) (BuildId: a8369742d3c435f7f1a244a535f4907c)
    #2 operator() third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3200:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c077) (BuildId: a8369742d3c435f7f1a244a535f4907c)
    #3 __invoke<(lambda at third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3196:46) &> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c077)
    #4 invoke<(lambda at third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3196:46) &> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c077)
    #5 InvokeR<void, (lambda at third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:3196:46) &, void> third_party/absl/functional/internal/any_invocable.h:131:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c077)
    #6 void absl::internal_any_invocable::LocalInvoker<false, void, grpc_core::ClientChannel::FilterBasedLoadBalancedCall::RetryPickLocked()::$_0&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:301:10 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Uclient_Uchannel.so+0x6c077)
    #7 operator() third_party/absl/functional/internal/any_invocable.h:855:1 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibevent_Uengine_Uem_Uimpl.so+0x16a74) (BuildId: 7802a789e61dddac5a33230a2f294046)
    #8 util::functional::internal::FunctorCallback<Closure, false, absl::AnyInvocable<void ()>, void ()>::Run() util/functional/to_callback_internal.h:87:27 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibevent_Uengine_Uem_Uimpl.so+0x16a74)
    #9 void eventmanager::EventManager::RunTask<false>(eventmanager::EventManager::WorkerThread*, eventmanager::TaskInfo*) net/eventmanager/em1/eventmanager.cc:2464:20 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x2ae1b) (BuildId: b3d97518d96e160722658c4aa0795013)
    #10 void eventmanager::EventManager::RunWorkerLoop<false, false>(eventmanager::EventManager::WorkerThread*) net/eventmanager/em1/eventmanager.cc (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x2a45d) (BuildId: b3d97518d96e160722658c4aa0795013)
    #11 eventmanager::EventManager::WorkerThread::WorkerMain() net/eventmanager/em1/eventmanager.cc:430:19 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x1d841) (BuildId: b3d97518d96e160722658c4aa0795013)
    #12 operator() net/eventmanager/em1/eventmanager.cc:396:64 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351) (BuildId: b3d97518d96e160722658c4aa0795013)
    #13 __invoke<(lambda at net/eventmanager/em1/eventmanager.cc:396:55)> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #14 invoke<(lambda at net/eventmanager/em1/eventmanager.cc:396:55)> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #15 InvokeR<void, (lambda at net/eventmanager/em1/eventmanager.cc:396:55), void> third_party/absl/functional/internal/any_invocable.h:131:3 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #16 void absl::internal_any_invocable::LocalInvoker<false, void, eventmanager::EventManager::WorkerThread::StartThread(int, int, thread::SchedPolicy, std::__tsan::basic_string_view<char, std::__tsan::char_traits<char>>)::'lambda'()&&>(absl::internal_any_invocable::TypeErasedState*) third_party/absl/functional/internal/any_invocable.h:301:10 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x28351)
    #17 operator() third_party/absl/functional/internal/any_invocable.h:863:1 (libST-a5842b3303c3_security_Sganpati_Ssharded_Uclient_Slibacl_Ucache.so+0x24c1e) (BuildId: a86eb59aee0be7b5882a1b532922e3ca)
    #18 ClosureThread::Run() thread/thread.h:459:25 (libST-a5842b3303c3_security_Sganpati_Ssharded_Uclient_Slibacl_Ucache.so+0x24c1e)
    #19 Thread::ThreadBody(void*) thread/thread.cc:1284:16 (libST-a5842b3303c3_thread_Slibthread.so+0x251e9) (BuildId: a1762ef5c5732b3ad8a3b3d798292e27)

  Thread T47 'EventManager_Default' (tid=7454, running) created by main thread at:
    #0 pthread_create third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1048:3 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x1887b9) (BuildId: e468d94e410a7e13e19b26c4f7277674)
    #1 Thread::CreatePthread(pthread_attr_t&) thread/thread.cc:485:13 (libST-a5842b3303c3_thread_Slibthread.so+0x24b1d) (BuildId: a1762ef5c5732b3ad8a3b3d798292e27)
    #2 Thread::Start() thread/thread.cc:667:3 (libST-a5842b3303c3_thread_Slibthread.so+0x2574b) (BuildId: a1762ef5c5732b3ad8a3b3d798292e27)
    #3 StartThread net/eventmanager/em1/eventmanager.cc:397:14 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x1bbed) (BuildId: b3d97518d96e160722658c4aa0795013)
    #4 eventmanager::EventManager::EventManager(eventmanager::EventManager::Options const&) net/eventmanager/em1/eventmanager.cc:1274:23 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x1bbed)
    #5 InitializeDefaultEventManager() net/eventmanager/em1/eventmanager.cc:3042:39 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x25c06) (BuildId: b3d97518d96e160722658c4aa0795013)
    #6 __invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #7 invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #8 void absl::base_internal::CallOnceImpl<void (*&)()>(std::__tsan::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (*&)()) third_party/absl/base/call_once.h:184:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #9 GoogleOnceInternalInit base/once.cc:23:5 (libST-a5842b3303c3_base_Slibonce.so+0xeee) (BuildId: 6c436371e89d7962c2b544d27465d6a0)
    #10 GoogleOnceInternalInitSchedCoopAndKernel(std::__tsan::atomic<unsigned int>*, void (*)()) base/once.cc:15:3 (libST-a5842b3303c3_base_Slibonce.so+0xeee)
    #11 GoogleOnceInit base/once.h:72:5 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x25ad9) (BuildId: b3d97518d96e160722658c4aa0795013)
    #12 eventmanager::EventManager::DefaultEventManager() net/eventmanager/em1/eventmanager.cc:3059:3 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x25ad9)
    #13 Default net/eventmanager/em1/eventmanager.h:780:43 (libST-a5842b3303c3_net_Seventmanager_Slibeventmanager_Udefault.so+0x14c5) (BuildId: 2e1dde05e8bce5319856279d896c4405)
    #14 eventmanager::Default() net/eventmanager/eventmanager_default.cc:22:10 (libST-a5842b3303c3_net_Seventmanager_Slibeventmanager_Udefault.so+0x14c5)
    #15 grpc::(anonymous namespace)::InitGlobalEventManager() net/grpc/internal/src/core/ext/event_engine/grpc_event_manager.cc:38:23 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibgrpc_Uevent_Umanager.so+0x1c28) (BuildId: 8bd3c1826fd11ffa128e7839a85c05a0)
    #16 __invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #17 invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #18 void absl::base_internal::CallOnceImpl<void (*&)()>(std::__tsan::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (*&)()) third_party/absl/base/call_once.h:184:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #19 call_once<void (*&)()> third_party/absl/base/call_once.h:216:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #20 gpr_once_init third_party/grpc/src/core/lib/gpr/sync_abseil.cc:107:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b)
    #21 grpc::GetGrpcEventManager() net/grpc/internal/src/core/ext/event_engine/grpc_event_manager.cc:51:3 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibgrpc_Uevent_Umanager.so+0x18c1) (BuildId: 8bd3c1826fd11ffa128e7839a85c05a0)
    #22 check_engine_available(bool) net/grpc/internal/src/core/ext/poller/event_manager_poller/ev_event_manager_linux.cc:1271:21 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Spoller_Sevent_Umanager_Upoller_Slibev_Uevent_Umanager_Ulinux.so+0x4fcb) (BuildId: 620adb83259aed32440614774678b3a1)
    #23 try_engine third_party/grpc/src/core/lib/iomgr/ev_posix.cc:141:9 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xad888) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #24 operator() third_party/grpc/src/core/lib/iomgr/ev_posix.cc:184:7 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xad888)
    #25 grpc_event_engine_init()::$_0::__invoke() third_party/grpc/src/core/lib/iomgr/ev_posix.cc:175:35 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xad888)
    #26 __invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #27 invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #28 void absl::base_internal::CallOnceImpl<void (*&)()>(std::__tsan::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (*&)()) third_party/absl/base/call_once.h:184:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #29 call_once<void (*&)()> third_party/absl/base/call_once.h:216:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #30 gpr_once_init third_party/grpc/src/core/lib/gpr/sync_abseil.cc:107:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b)
    #31 grpc_event_engine_init() third_party/grpc/src/core/lib/iomgr/ev_posix.cc:175:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xacab1) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #32 iomgr_platform_init() third_party/grpc/src/core/lib/iomgr/iomgr_posix.cc:43:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xaf1d0) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #33 grpc_iomgr_platform_init() third_party/grpc/src/core/lib/iomgr/iomgr_internal.cc:35:35 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibexec_Uctx.so+0x688b) (BuildId: cafb035b95f6c781e93706a71dce95f3)
    #34 grpc_iomgr_init() third_party/grpc/src/core/lib/iomgr/iomgr.cc:66:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xae777) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #35 grpc_init third_party/grpc/src/core/lib/surface/init.cc:144:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc.so+0x4518) (BuildId: 778b43e2e8f8c132a7acf9b3da117fff)
    #36 k3::(anonymous namespace)::GrpcInitHackEnvironment::SetUp() storage/k3/grpc/grpc_server_test.cc:144:27 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x212d21) (BuildId: e468d94e410a7e13e19b26c4f7277674)
    #37 SetUpEnvironment third_party/googletest/googletest/src/gtest.cc:5763:55 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b) (BuildId: ec51e8d56b1390e9939f663d4700c7da)
    #38 for_each<std::__tsan::__wrap_iter<testing::Environment *const *>, void (*)(testing::Environment *)> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__algorithm/for_each.h:26:5 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b)
    #39 ForEach<std::__tsan::vector<testing::Environment *, std::__tsan::allocator<testing::Environment *> >, void (*)(testing::Environment *)> third_party/googletest/googletest/src/gtest-internal-inl.h:288:3 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b)
    grpc#40 testing::internal::UnitTestImpl::RunAllTests() third_party/googletest/googletest/src/gtest.cc:5876:9 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b)
    grpc#41 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> third_party/googletest/googletest/src/gtest.cc (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6ee41) (BuildId: ec51e8d56b1390e9939f663d4700c7da)
    grpc#42 testing::UnitTest::Run() third_party/googletest/googletest/src/gtest.cc:5464:10 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6ee41)
    grpc#43 RUN_ALL_TESTS third_party/googletest/googletest/include/gtest/gtest.h:2329:73 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x2aa83a) (BuildId: e468d94e410a7e13e19b26c4f7277674)
    grpc#44 main testing/base/internal/gunit_main.cc:86:10 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x2aa83a)

  Thread T40 'EventManager_Default' (tid=7447, running) created by main thread at:
    #0 pthread_create third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1048:3 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x1887b9) (BuildId: e468d94e410a7e13e19b26c4f7277674)
    #1 Thread::CreatePthread(pthread_attr_t&) thread/thread.cc:485:13 (libST-a5842b3303c3_thread_Slibthread.so+0x24b1d) (BuildId: a1762ef5c5732b3ad8a3b3d798292e27)
    #2 Thread::Start() thread/thread.cc:667:3 (libST-a5842b3303c3_thread_Slibthread.so+0x2574b) (BuildId: a1762ef5c5732b3ad8a3b3d798292e27)
    #3 StartThread net/eventmanager/em1/eventmanager.cc:397:14 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x1bbed) (BuildId: b3d97518d96e160722658c4aa0795013)
    #4 eventmanager::EventManager::EventManager(eventmanager::EventManager::Options const&) net/eventmanager/em1/eventmanager.cc:1274:23 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x1bbed)
    #5 InitializeDefaultEventManager() net/eventmanager/em1/eventmanager.cc:3042:39 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x25c06) (BuildId: b3d97518d96e160722658c4aa0795013)
    #6 __invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #7 invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #8 void absl::base_internal::CallOnceImpl<void (*&)()>(std::__tsan::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (*&)()) third_party/absl/base/call_once.h:184:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #9 GoogleOnceInternalInit base/once.cc:23:5 (libST-a5842b3303c3_base_Slibonce.so+0xeee) (BuildId: 6c436371e89d7962c2b544d27465d6a0)
    #10 GoogleOnceInternalInitSchedCoopAndKernel(std::__tsan::atomic<unsigned int>*, void (*)()) base/once.cc:15:3 (libST-a5842b3303c3_base_Slibonce.so+0xeee)
    #11 GoogleOnceInit base/once.h:72:5 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x25ad9) (BuildId: b3d97518d96e160722658c4aa0795013)
    #12 eventmanager::EventManager::DefaultEventManager() net/eventmanager/em1/eventmanager.cc:3059:3 (libST-a5842b3303c3_net_Seventmanager_Sem1_Slibem1.so+0x25ad9)
    #13 Default net/eventmanager/em1/eventmanager.h:780:43 (libST-a5842b3303c3_net_Seventmanager_Slibeventmanager_Udefault.so+0x14c5) (BuildId: 2e1dde05e8bce5319856279d896c4405)
    #14 eventmanager::Default() net/eventmanager/eventmanager_default.cc:22:10 (libST-a5842b3303c3_net_Seventmanager_Slibeventmanager_Udefault.so+0x14c5)
    #15 grpc::(anonymous namespace)::InitGlobalEventManager() net/grpc/internal/src/core/ext/event_engine/grpc_event_manager.cc:38:23 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibgrpc_Uevent_Umanager.so+0x1c28) (BuildId: 8bd3c1826fd11ffa128e7839a85c05a0)
    #16 __invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #17 invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #18 void absl::base_internal::CallOnceImpl<void (*&)()>(std::__tsan::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (*&)()) third_party/absl/base/call_once.h:184:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #19 call_once<void (*&)()> third_party/absl/base/call_once.h:216:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #20 gpr_once_init third_party/grpc/src/core/lib/gpr/sync_abseil.cc:107:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b)
    #21 grpc::GetGrpcEventManager() net/grpc/internal/src/core/ext/event_engine/grpc_event_manager.cc:51:3 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Sevent_Uengine_Slibgrpc_Uevent_Umanager.so+0x18c1) (BuildId: 8bd3c1826fd11ffa128e7839a85c05a0)
    #22 check_engine_available(bool) net/grpc/internal/src/core/ext/poller/event_manager_poller/ev_event_manager_linux.cc:1271:21 (libST-a5842b3303c3_net_Sgrpc_Sinternal_Ssrc_Score_Sext_Spoller_Sevent_Umanager_Upoller_Slibev_Uevent_Umanager_Ulinux.so+0x4fcb) (BuildId: 620adb83259aed32440614774678b3a1)
    #23 try_engine third_party/grpc/src/core/lib/iomgr/ev_posix.cc:141:9 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xad888) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #24 operator() third_party/grpc/src/core/lib/iomgr/ev_posix.cc:184:7 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xad888)
    #25 grpc_event_engine_init()::$_0::__invoke() third_party/grpc/src/core/lib/iomgr/ev_posix.cc:175:35 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xad888)
    #26 __invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:394:23 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #27 invoke<void (*&)()> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__functional/invoke.h:539:12 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #28 void absl::base_internal::CallOnceImpl<void (*&)()>(std::__tsan::atomic<unsigned int>*, absl::base_internal::SchedulingMode, void (*&)()) third_party/absl/base/call_once.h:184:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaab9)
    #29 call_once<void (*&)()> third_party/absl/base/call_once.h:216:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b) (BuildId: fe43c5a995ecc22efa243d05ef607fb9)
    #30 gpr_once_init third_party/grpc/src/core/lib/gpr/sync_abseil.cc:107:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgpr.so+0xaa0b)
    #31 grpc_event_engine_init() third_party/grpc/src/core/lib/iomgr/ev_posix.cc:175:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xacab1) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #32 iomgr_platform_init() third_party/grpc/src/core/lib/iomgr/iomgr_posix.cc:43:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xaf1d0) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #33 grpc_iomgr_platform_init() third_party/grpc/src/core/lib/iomgr/iomgr_internal.cc:35:35 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibexec_Uctx.so+0x688b) (BuildId: cafb035b95f6c781e93706a71dce95f3)
    #34 grpc_iomgr_init() third_party/grpc/src/core/lib/iomgr/iomgr.cc:66:3 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc_Ubase.so+0xae777) (BuildId: 3cb6b043bd10222927301e33a3760646)
    #35 grpc_init third_party/grpc/src/core/lib/surface/init.cc:144:5 (libST-a5842b3303c3_third_Uparty_Sgrpc_Sgoogle_Slibgrpc.so+0x4518) (BuildId: 778b43e2e8f8c132a7acf9b3da117fff)
    #36 k3::(anonymous namespace)::GrpcInitHackEnvironment::SetUp() storage/k3/grpc/grpc_server_test.cc:144:27 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x212d21) (BuildId: e468d94e410a7e13e19b26c4f7277674)
    #37 SetUpEnvironment third_party/googletest/googletest/src/gtest.cc:5763:55 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b) (BuildId: ec51e8d56b1390e9939f663d4700c7da)
    #38 for_each<std::__tsan::__wrap_iter<testing::Environment *const *>, void (*)(testing::Environment *)> third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/__algorithm/for_each.h:26:5 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b)
    #39 ForEach<std::__tsan::vector<testing::Environment *, std::__tsan::allocator<testing::Environment *> >, void (*)(testing::Environment *)> third_party/googletest/googletest/src/gtest-internal-inl.h:288:3 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b)
    grpc#40 testing::internal::UnitTestImpl::RunAllTests() third_party/googletest/googletest/src/gtest.cc:5876:9 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6f58b)
    grpc#41 HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> third_party/googletest/googletest/src/gtest.cc (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6ee41) (BuildId: ec51e8d56b1390e9939f663d4700c7da)
    grpc#42 testing::UnitTest::Run() third_party/googletest/googletest/src/gtest.cc:5464:10 (libST-a5842b3303c3_third_Uparty_Sgoogletest_Slibgtest.so+0x6ee41)
    grpc#43 RUN_ALL_TESTS third_party/googletest/googletest/include/gtest/gtest.h:2329:73 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x2aa83a) (BuildId: e468d94e410a7e13e19b26c4f7277674)
    grpc#44 main testing/base/internal/gunit_main.cc:86:10 (c0da98875dfb74bd2cfcbd76306f382f2d05e1326f778b3a4cefdef1a42a75e7_020000b94930+0x2aa83a)

SUMMARY: ThreadSanitizer: data race third_party/grpc/src/core/ext/filters/client_channel/client_channel.cc:2780:5 in grpc_core::ClientChannel::FilterBasedLoadBalancedCall::~FilterBasedLoadBalancedCall()
```
cb-robot pushed a commit that referenced this pull request Sep 26, 2023
Long story here:

CallbackAlternativeCQ operates a thread pool which processes a
completion queue and then directly invokes the completion function in
the same thread. This thread pool is initialized on first Ref() and
unallocated on last Unref().

When running an in-process synchronous server (as we do for tests, using
this
https://github.com/google/tensorstore/blob/master/tensorstore/internal/grpc/grpc_mock.h),
called by an async() interface caller, if the async() callback happens
to drop the last reference to the grpc Channel, then the channel
shutdown will attempt to run in one of the the CallbackAlternativeCQ
threads.

This will cause a deadlock/race condition, as `CallbackAlternativeCQ` is
not designed to shutdown itself. When this deadlock happens,
`pthread_join(pthread_id_)` will return `EDEADLK` and the thread will
keep running. However `EDEADLK` is silently ignored by Join() so
`CallbackAlternativeCQ::Unref` will continue to delete the underlying
grpc_completion_queue, leading to a `SIGSEGV` later in the process.


https://github.com/grpc/grpc/blob/97ba9871324cb68b93f22fd1860934392cd476ee/src/cpp/common/completion_queue_cc.cc#L115

This adds an assert that pthread_join succeeded, which is useful as it
avoids a later SIGSEBV. Alternatively, the thread implementation could
gpr_log the errorcode before asserting.


Example backtrace of crash:

frame #0: 0x0000000194f1e868 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x0000000194f55cec libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x0000000194e8e2c8 libsystem_c.dylib`abort + 180
    frame #3: 0x0000000194e8d620 libsystem_c.dylib`__assert_rtn + 272
frame #4: 0x0000000100a64f50 grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::Join() + 188
frame #5: 0x00000001009c5dd0 grpc_kvstore_test`grpc_core::Thread::Join()
+ 56
frame #6: 0x0000000100154474 grpc_kvstore_test`grpc::(anonymous
namespace)::CallbackAlternativeCQ::Unref() + 216
frame #7: 0x0000000100154390
grpc_kvstore_test`grpc::CompletionQueue::ReleaseCallbackAlternativeCQ(grpc::CompletionQueue*)
+ 120
frame #8: 0x000000010014130c grpc_kvstore_test`grpc::Channel::~Channel()
+ 220
frame #9: 0x00000001001413c8 grpc_kvstore_test`grpc::Channel::~Channel()
+ 28
frame #10: 0x000000010014d678
grpc_kvstore_test`std::__1::default_delete<grpc::Channel>::operator()(grpc::Channel*)
const + 44
frame #11: 0x000000010014d358
grpc_kvstore_test`std::__1::__shared_ptr_pointer<grpc::Channel*,
std::__1::shared_ptr<grpc::Channel>::__shared_ptr_default_delete<grpc::Channel,
grpc::Channel>, std::__1::allocator<grpc::Channel> >::__on_zero_shared()
+ 72
frame #12: 0x000000010002ab5c
grpc_kvstore_test`std::__1::__shared_count::__release_shared() + 60
frame #13: 0x000000010002ab00
grpc_kvstore_test`std::__1::__shared_weak_count::__release_shared() + 28
frame #14: 0x000000010002aad0
grpc_kvstore_test`std::__1::shared_ptr<grpc::ServerCredentials>::~shared_ptr()
+ 56
frame #15: 0x00000001000053ec
grpc_kvstore_test`std::__1::shared_ptr<tensorstore_grpc::kvstore::grpc_gen::KvStoreService::Stub>::~shared_ptr()
+ 28
frame #16: 0x000000010014653c
grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 356
frame #17: 0x0000000100146570
grpc_kvstore_test`grpc::ClientContext::~ClientContext() + 28
frame #18: 0x00000001000ab000 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::~ReadTask() + 68
frame #19: 0x00000001000aae90 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::~ReadTask() + 28
frame #20: 0x00000001000aae18
grpc_kvstore_test`tensorstore::internal::intrusive_ptr_decrement(tensorstore::internal::AtomicReferenceCount<tensorstore::(anonymous
namespace)::ReadTask> const*) + 68
frame #21: 0x00000001000aadc8 grpc_kvstore_test`void
tensorstore::internal::DefaultIntrusivePtrTraits::decrement<tensorstore::(anonymous
namespace)::ReadTask*>(tensorstore::(anonymous namespace)::ReadTask*) +
24
frame #22: 0x00000001000aad9c
grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous
namespace)::ReadTask,
tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 52
frame #23: 0x00000001000a5994
grpc_kvstore_test`tensorstore::internal::IntrusivePtr<tensorstore::(anonymous
namespace)::ReadTask,
tensorstore::internal::DefaultIntrusivePtrTraits>::~IntrusivePtr() + 28
frame #24: 0x00000001000aac24 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)::~() + 40
frame #25: 0x00000001000a6280 grpc_kvstore_test`tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)::~() + 28
frame #26: 0x00000001000a84ac
grpc_kvstore_test`std::__1::__compressed_pair_elem<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status), 0,
false>::~__compressed_pair_elem() + 28
frame #27: 0x00000001000a86c0
grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #28: 0x00000001000a8694
grpc_kvstore_test`std::__1::__compressed_pair<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)> >::~__compressed_pair() + 28
frame #29: 0x00000001000a990c
grpc_kvstore_test`std::__1::__function::__alloc_func<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() +
24
frame #30: 0x00000001000a7ea0
grpc_kvstore_test`std::__1::__function::__func<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status),
std::__1::allocator<tensorstore::(anonymous
namespace)::ReadTask::Start(tensorstore_grpc::kvstore::grpc_gen::KvStoreService::StubInterface*,
absl::Time)::'lambda'(grpc::Status)>, void (grpc::Status)>::destroy() +
28
frame #31: 0x00000001000aabbc
grpc_kvstore_test`std::__1::__function::__value_func<void
(grpc::Status)>::~__value_func() + 68
frame #32: 0x00000001000aab68
grpc_kvstore_test`std::__1::__function::__value_func<void
(grpc::Status)>::~__value_func() + 28
frame #33: 0x00000001000aab3c grpc_kvstore_test`std::__1::function<void
(grpc::Status)>::~function() + 28
frame #34: 0x00000001000a6254 grpc_kvstore_test`std::__1::function<void
(grpc::Status)>::~function() + 28
frame #35: 0x0000000100108ae0
grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::Run(bool) + 368
frame #36: 0x0000000100108964
grpc_kvstore_test`grpc::internal::CallbackWithStatusTag::StaticRun(grpc_completion_queue_functor*,
int) + 44
frame #37: 0x0000000100154cb0 grpc_kvstore_test`grpc::(anonymous
namespace)::CallbackAlternativeCQ::ThreadLoop(void*) + 356
frame #38: 0x0000000100a650b8 grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void
(*)(void*), void*, bool*, grpc_core::Thread::Options
const&)::'lambda'(void*)::operator()(void*) const + 240
frame #39: 0x0000000100a64fbc grpc_kvstore_test`grpc_core::(anonymous
namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void
(*)(void*), void*, bool*, grpc_core::Thread::Options
const&)::'lambda'(void*)::__invoke(void*) + 28
frame grpc#40: 0x0000000194f5606c libsystem_pthread.dylib`_pthread_start +
148
cb-robot pushed a commit that referenced this pull request Sep 26, 2023
Found memory access error in frame_fuzzer_test. Located the root cause
in ExecCtx::Get(), where ExecCtx needs to be initialized before using
HPackParser:ParseInput().


Error logs:
MemorySanitizer:DEADLYSIGNAL
==2812845==ERROR: MemorySanitizer: SEGV on unknown address
0x000000000030 (pc 0x55869275574e bp 0x7fffd7d9fb50 sp 0x7fffd7d9fb20
T2812845)
==2812845==The signal is caused by a READ memory access.
==2812845==Hint: address points to the zero page.
#0 0x55869275574e in starting_cpu
[third_party/grpc/src/core/lib/iomgr/exec_ctx.h:129](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/lib/iomgr/exec_ctx.h?l=129&ws=ladynana/2900&snapshot=42):9
#1 0x55869275574e in
grpc_core::PerCpu<grpc_core::GlobalStatsCollector::Data>::this_cpu()
[third_party/grpc/src/core/lib/gprpp/per_cpu.h:38](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/lib/gprpp/per_cpu.h?l=38&ws=ladynana/2900&snapshot=42):48
#2 0x558692753cda in IncrementHttp2MetadataSize
[third_party/grpc/src/core/lib/debug/stats_data.h:265](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/lib/debug/stats_data.h?l=265&ws=ladynana/2900&snapshot=42):11
#3 0x558692753cda in
grpc_core::HPackParser::ParseInput(grpc_core::HPackParser::Input, bool)
[third_party/grpc/src/core/ext/transport/chttp2/transport/hpack_parser.cc:933](https://cs.corp.google.com/piper///depot/google3/third_party/grpc/src/core/ext/transport/chttp2/transport/hpack_parser.cc?l=933&ws=ladynana/2900&snapshot=42):20


<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
cb-robot pushed a commit that referenced this pull request Apr 23, 2025
Backport grpc#37459 to 1.63
Internal bug: b/357864682

A lock ordering inversion was noticed with the following stacks -
```
[mutex.cc : 1418] RAW: Potential Mutex deadlock:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be968c5 grpc::internal::OpenTelemetryPluginImpl::RemoveCallback()
        @ 0x564f4cd097b8 grpc_core::RegisteredMetricCallback::~RegisteredMetricCallback()
        @ 0x564f4c1f1216 std::default_delete<>::operator()()
        @ 0x564f4c1f157f std::__uniq_ptr_impl<>::reset()
        @ 0x564f4c1ee967 std::unique_ptr<>::reset()
        @ 0x564f4c352f44 grpc_core::GrpcXdsClient::Orphaned()
        @ 0x564f4c25dad1 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c4653ed grpc_core::RefCountedPtr<>::reset()
        @ 0x564f4c463c73 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c463d02 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c25efa9 grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c25d5f0 grpc_core::RefCounted<>::Unref()
        @ 0x564f4c25c2d9 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c25b1d8 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c25b240 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c14e958 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c14e980 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c26bafc std::pair<>::~pair()
        @ 0x564f4c26bb28 __gnu_cxx::new_allocator<>::destroy<>()
        @ 0x564f4c26b88f std::allocator_traits<>::destroy<>()
        @ 0x564f4c26b297 std::_Rb_tree<>::_M_destroy_node()
        @ 0x564f4c26abfb std::_Rb_tree<>::_M_drop_node()
        @ 0x564f4c26a926 std::_Rb_tree<>::_M_erase()
        @ 0x564f4c26a6f0 std::_Rb_tree<>::~_Rb_tree()
        @ 0x564f4c26a62a std::map<>::~map()
        @ 0x564f4c2691a4 grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c2691cc grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()

[mutex.cc : 1428] RAW: Acquiring absl::Mutex 0x564f4f22ad40 while holding  0x7f939834bb70; a cycle in the historical lock ordering graph has been observed
[mutex.cc : 1432] RAW: Cycle:
[mutex.cc : 1446] RAW: mutex@0x564f4f22ad40 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be96124 grpc::internal::OpenTelemetryPluginImpl::AddCallback()
        @ 0x564f4cd096f0 grpc_core::RegisteredMetricCallback::RegisteredMetricCallback()
        @ 0x564f4c1f111b std::make_unique<>()
        @ 0x564f4c3564b0 grpc_core::GlobalStatsPluginRegistry::StatsPluginGroup::RegisterCallback<>()
        @ 0x564f4c352dea grpc_core::GrpcXdsClient::GrpcXdsClient()
        @ 0x564f4c355bc6 grpc_core::MakeRefCounted<>()
        @ 0x564f4c3525f2 grpc_core::GrpcXdsClient::GetOrCreate()
        @ 0x564f4c28f8f8 grpc_core::(anonymous namespace)::XdsResolver::StartLocked()
        @ 0x564f4c2f5f82 grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartXdsResolver()
        @ 0x564f4c2f515d grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::ZoneQueryDone()
        @ 0x564f4c2f496b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()::{lambda()#1}::operator()()
        @ 0x564f4c2f80f6 std::__invoke_impl<>()
        @ 0x564f4c2f7b9d _ZSt10__invoke_rIvRZZN9grpc_core12_GLOBAL__N_124GoogleCloud2ProdResolver11StartLockedEvENUlNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN4absl12lts_202401168StatusOrIS8_EEE_clES8_SC_EUlvE_J...
        @ 0x564f4c2f748c std::_Function_handler<>::_M_invoke()
        @ 0x564f4b8ad682 std::function<>::operator()()
        @ 0x564f4cd1c6bf grpc_core::WorkSerializer::LegacyWorkSerializer::Run()
        @ 0x564f4cd1dae4 grpc_core::WorkSerializer::Run()
        @ 0x564f4c2f4b0b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()
        @ 0x564f4c2f8dc7 absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c2f8cb8 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c2f8b16 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c2f8a0c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4c2fb88d absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4c2fb1f3 grpc_core::GcpMetadataQuery::OnDone()
        @ 0x564f4cd75a72 exec_ctx_run()
        @ 0x564f4cd75ba9 grpc_core::ExecCtx::Flush()
        @ 0x564f4cc8ee1d end_worker()
        @ 0x564f4cc8f304 pollset_work()
        @ 0x564f4cc5dcaf pollset_work()
        @ 0x564f4cc69220 grpc_pollset_work()
        @ 0x564f4cbe7733 cq_pluck()
        @ 0x564f4cbe7ad5 grpc_completion_queue_pluck
        @ 0x564f4bc61d96 grpc::CompletionQueue::Pluck()
        @ 0x564f4bfdb055 grpc::ClientReader<>::ClientReader<>()
        @ 0x564f4bfd6035 grpc::internal::ClientReaderFactory<>::Create<>()
        @ 0x564f4bfc322b google::storage::v2::Storage::Stub::ReadObjectRaw()
        @ 0x564f4bf9934b google::storage::v2::Storage::Stub::ReadObject()

[mutex.cc : 1446] RAW: mutex@0x7f939834bb70 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4c1ce9eb grpc_core::(anonymous namespace)::RlsLb::RlsLb()::{lambda()#1}::operator()()
        @ 0x564f4c1e794c absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c1e72c1 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c1e6af1 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c1e5d6c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4be9d0c8 absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4be9b4ff grpc_core::RegisteredMetricCallback::Run()
        @ 0x564f4bea07ae grpc::internal::OpenTelemetryPluginImpl::CallbackGaugeState<>::CallbackGaugeCallback()
        @ 0x564f4bf844de opentelemetry::v1::sdk::metrics::ObservableRegistry::Observe()
        @ 0x564f4bf56529 opentelemetry::v1::sdk::metrics::Meter::Collect()
        @ 0x564f4bf8c1d5 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5ac opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5e8 opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::_FUN()
        @ 0x564f4bf7604d opentelemetry::v1::nostd::function_ref<>::operator()()
        @ 0x564f4bf74ad9 opentelemetry::v1::sdk::metrics::MeterContext::ForEachMeter()
        @ 0x564f4bf8c457 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()
        @ 0x564f4bf4a7fe opentelemetry::v1::sdk::metrics::MetricReader::Collect()
        @ 0x564f4bed5e24 opentelemetry::v1::exporter::metrics::PrometheusCollector::Collect()
        @ 0x564f4bef004f prometheus::detail::CollectMetrics()
        @ 0x564f4beec26d prometheus::detail::MetricsHandler::handleGet()
        @ 0x564f4bf1cd8b CivetServer::requestHandler()
        @ 0x564f4bf35e7b handle_request
        @ 0x564f4bf29534 handle_request_stat_log
        @ 0x564f4bf39b3f process_new_connection
        @ 0x564f4bf3a448 worker_thread_run
        @ 0x564f4bf3a57f worker_thread
        @ 0x7f93e9137ea7 start_thread

[mutex.cc : 1454] RAW: dying due to potential deadlock
Aborted
```

From the stack, it looks like we are ending up holding a lock to the
`RlsLB` policy while removing a callback from the gRPC OpenTelemetry
plugin, which is a lock ordering inversion. The correct order is
`OpenTelemetry` -> `gRPC OpenTelemetry plugin` -> `gRPC Component like
RLS/xDSClient`.

A common pattern we employ for metrics is for the callbacks to be
unregistered when the corresponding component object is
orphaned/destroyed (unreffing). Also, note that removing callbacks
requires a lock in `gRPC OpenTelemetry plugin`. To avoid deadlocks, we
remove the callback inside `RlsLb` from outside the critical region, but
`RlsLb` owns refs to child policies which in turn hold refs to
`XdsClient`. The lock ordering inversion occurred due to unreffing child
policies within the critical region.

This PR is an alternative fix to this problem. Original fix in
grpc#37425. Verified that it fixes the bug.
cb-robot pushed a commit that referenced this pull request Apr 23, 2025
Backport grpc#37459 to v1.65
Internal bug: b/357864682

A lock ordering inversion was noticed with the following stacks -
```
[mutex.cc : 1418] RAW: Potential Mutex deadlock:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be968c5 grpc::internal::OpenTelemetryPluginImpl::RemoveCallback()
        @ 0x564f4cd097b8 grpc_core::RegisteredMetricCallback::~RegisteredMetricCallback()
        @ 0x564f4c1f1216 std::default_delete<>::operator()()
        @ 0x564f4c1f157f std::__uniq_ptr_impl<>::reset()
        @ 0x564f4c1ee967 std::unique_ptr<>::reset()
        @ 0x564f4c352f44 grpc_core::GrpcXdsClient::Orphaned()
        @ 0x564f4c25dad1 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c4653ed grpc_core::RefCountedPtr<>::reset()
        @ 0x564f4c463c73 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c463d02 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c25efa9 grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c25d5f0 grpc_core::RefCounted<>::Unref()
        @ 0x564f4c25c2d9 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c25b1d8 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c25b240 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c14e958 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c14e980 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c26bafc std::pair<>::~pair()
        @ 0x564f4c26bb28 __gnu_cxx::new_allocator<>::destroy<>()
        @ 0x564f4c26b88f std::allocator_traits<>::destroy<>()
        @ 0x564f4c26b297 std::_Rb_tree<>::_M_destroy_node()
        @ 0x564f4c26abfb std::_Rb_tree<>::_M_drop_node()
        @ 0x564f4c26a926 std::_Rb_tree<>::_M_erase()
        @ 0x564f4c26a6f0 std::_Rb_tree<>::~_Rb_tree()
        @ 0x564f4c26a62a std::map<>::~map()
        @ 0x564f4c2691a4 grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c2691cc grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()

[mutex.cc : 1428] RAW: Acquiring absl::Mutex 0x564f4f22ad40 while holding  0x7f939834bb70; a cycle in the historical lock ordering graph has been observed
[mutex.cc : 1432] RAW: Cycle:
[mutex.cc : 1446] RAW: mutex@0x564f4f22ad40 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be96124 grpc::internal::OpenTelemetryPluginImpl::AddCallback()
        @ 0x564f4cd096f0 grpc_core::RegisteredMetricCallback::RegisteredMetricCallback()
        @ 0x564f4c1f111b std::make_unique<>()
        @ 0x564f4c3564b0 grpc_core::GlobalStatsPluginRegistry::StatsPluginGroup::RegisterCallback<>()
        @ 0x564f4c352dea grpc_core::GrpcXdsClient::GrpcXdsClient()
        @ 0x564f4c355bc6 grpc_core::MakeRefCounted<>()
        @ 0x564f4c3525f2 grpc_core::GrpcXdsClient::GetOrCreate()
        @ 0x564f4c28f8f8 grpc_core::(anonymous namespace)::XdsResolver::StartLocked()
        @ 0x564f4c2f5f82 grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartXdsResolver()
        @ 0x564f4c2f515d grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::ZoneQueryDone()
        @ 0x564f4c2f496b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()::{lambda()#1}::operator()()
        @ 0x564f4c2f80f6 std::__invoke_impl<>()
        @ 0x564f4c2f7b9d _ZSt10__invoke_rIvRZZN9grpc_core12_GLOBAL__N_124GoogleCloud2ProdResolver11StartLockedEvENUlNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN4absl12lts_202401168StatusOrIS8_EEE_clES8_SC_EUlvE_J...
        @ 0x564f4c2f748c std::_Function_handler<>::_M_invoke()
        @ 0x564f4b8ad682 std::function<>::operator()()
        @ 0x564f4cd1c6bf grpc_core::WorkSerializer::LegacyWorkSerializer::Run()
        @ 0x564f4cd1dae4 grpc_core::WorkSerializer::Run()
        @ 0x564f4c2f4b0b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()
        @ 0x564f4c2f8dc7 absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c2f8cb8 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c2f8b16 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c2f8a0c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4c2fb88d absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4c2fb1f3 grpc_core::GcpMetadataQuery::OnDone()
        @ 0x564f4cd75a72 exec_ctx_run()
        @ 0x564f4cd75ba9 grpc_core::ExecCtx::Flush()
        @ 0x564f4cc8ee1d end_worker()
        @ 0x564f4cc8f304 pollset_work()
        @ 0x564f4cc5dcaf pollset_work()
        @ 0x564f4cc69220 grpc_pollset_work()
        @ 0x564f4cbe7733 cq_pluck()
        @ 0x564f4cbe7ad5 grpc_completion_queue_pluck
        @ 0x564f4bc61d96 grpc::CompletionQueue::Pluck()
        @ 0x564f4bfdb055 grpc::ClientReader<>::ClientReader<>()
        @ 0x564f4bfd6035 grpc::internal::ClientReaderFactory<>::Create<>()
        @ 0x564f4bfc322b google::storage::v2::Storage::Stub::ReadObjectRaw()
        @ 0x564f4bf9934b google::storage::v2::Storage::Stub::ReadObject()

[mutex.cc : 1446] RAW: mutex@0x7f939834bb70 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4c1ce9eb grpc_core::(anonymous namespace)::RlsLb::RlsLb()::{lambda()#1}::operator()()
        @ 0x564f4c1e794c absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c1e72c1 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c1e6af1 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c1e5d6c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4be9d0c8 absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4be9b4ff grpc_core::RegisteredMetricCallback::Run()
        @ 0x564f4bea07ae grpc::internal::OpenTelemetryPluginImpl::CallbackGaugeState<>::CallbackGaugeCallback()
        @ 0x564f4bf844de opentelemetry::v1::sdk::metrics::ObservableRegistry::Observe()
        @ 0x564f4bf56529 opentelemetry::v1::sdk::metrics::Meter::Collect()
        @ 0x564f4bf8c1d5 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5ac opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5e8 opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::_FUN()
        @ 0x564f4bf7604d opentelemetry::v1::nostd::function_ref<>::operator()()
        @ 0x564f4bf74ad9 opentelemetry::v1::sdk::metrics::MeterContext::ForEachMeter()
        @ 0x564f4bf8c457 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()
        @ 0x564f4bf4a7fe opentelemetry::v1::sdk::metrics::MetricReader::Collect()
        @ 0x564f4bed5e24 opentelemetry::v1::exporter::metrics::PrometheusCollector::Collect()
        @ 0x564f4bef004f prometheus::detail::CollectMetrics()
        @ 0x564f4beec26d prometheus::detail::MetricsHandler::handleGet()
        @ 0x564f4bf1cd8b CivetServer::requestHandler()
        @ 0x564f4bf35e7b handle_request
        @ 0x564f4bf29534 handle_request_stat_log
        @ 0x564f4bf39b3f process_new_connection
        @ 0x564f4bf3a448 worker_thread_run
        @ 0x564f4bf3a57f worker_thread
        @ 0x7f93e9137ea7 start_thread

[mutex.cc : 1454] RAW: dying due to potential deadlock
Aborted
```

From the stack, it looks like we are ending up holding a lock to the
`RlsLB` policy while removing a callback from the gRPC OpenTelemetry
plugin, which is a lock ordering inversion. The correct order is
`OpenTelemetry` -> `gRPC OpenTelemetry plugin` -> `gRPC Component like
RLS/xDSClient`.

A common pattern we employ for metrics is for the callbacks to be
unregistered when the corresponding component object is
orphaned/destroyed (unreffing). Also, note that removing callbacks
requires a lock in `gRPC OpenTelemetry plugin`. To avoid deadlocks, we
remove the callback inside `RlsLb` from outside the critical region, but
`RlsLb` owns refs to child policies which in turn hold refs to
`XdsClient`. The lock ordering inversion occurred due to unreffing child
policies within the critical region.

This PR is an alternative fix to this problem. Original fix in
grpc#37425. Verified that it fixes the bug.

Closes grpc#37459

COPYBARA_INTEGRATE_REVIEW=grpc#37459 from
yashykt:FixDeadlocks ec7fbcf
PiperOrigin-RevId: 663360427




<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
cb-robot pushed a commit that referenced this pull request Apr 23, 2025
Backport grpc#37459 to 1.64.x
Internal bug: b/357864682

A lock ordering inversion was noticed with the following stacks -
```
[mutex.cc : 1418] RAW: Potential Mutex deadlock:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be968c5 grpc::internal::OpenTelemetryPluginImpl::RemoveCallback()
        @ 0x564f4cd097b8 grpc_core::RegisteredMetricCallback::~RegisteredMetricCallback()
        @ 0x564f4c1f1216 std::default_delete<>::operator()()
        @ 0x564f4c1f157f std::__uniq_ptr_impl<>::reset()
        @ 0x564f4c1ee967 std::unique_ptr<>::reset()
        @ 0x564f4c352f44 grpc_core::GrpcXdsClient::Orphaned()
        @ 0x564f4c25dad1 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c4653ed grpc_core::RefCountedPtr<>::reset()
        @ 0x564f4c463c73 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c463d02 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c25efa9 grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c25d5f0 grpc_core::RefCounted<>::Unref()
        @ 0x564f4c25c2d9 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c25b1d8 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c25b240 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c14e958 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c14e980 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c26bafc std::pair<>::~pair()
        @ 0x564f4c26bb28 __gnu_cxx::new_allocator<>::destroy<>()
        @ 0x564f4c26b88f std::allocator_traits<>::destroy<>()
        @ 0x564f4c26b297 std::_Rb_tree<>::_M_destroy_node()
        @ 0x564f4c26abfb std::_Rb_tree<>::_M_drop_node()
        @ 0x564f4c26a926 std::_Rb_tree<>::_M_erase()
        @ 0x564f4c26a6f0 std::_Rb_tree<>::~_Rb_tree()
        @ 0x564f4c26a62a std::map<>::~map()
        @ 0x564f4c2691a4 grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c2691cc grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()

[mutex.cc : 1428] RAW: Acquiring absl::Mutex 0x564f4f22ad40 while holding  0x7f939834bb70; a cycle in the historical lock ordering graph has been observed
[mutex.cc : 1432] RAW: Cycle:
[mutex.cc : 1446] RAW: mutex@0x564f4f22ad40 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be96124 grpc::internal::OpenTelemetryPluginImpl::AddCallback()
        @ 0x564f4cd096f0 grpc_core::RegisteredMetricCallback::RegisteredMetricCallback()
        @ 0x564f4c1f111b std::make_unique<>()
        @ 0x564f4c3564b0 grpc_core::GlobalStatsPluginRegistry::StatsPluginGroup::RegisterCallback<>()
        @ 0x564f4c352dea grpc_core::GrpcXdsClient::GrpcXdsClient()
        @ 0x564f4c355bc6 grpc_core::MakeRefCounted<>()
        @ 0x564f4c3525f2 grpc_core::GrpcXdsClient::GetOrCreate()
        @ 0x564f4c28f8f8 grpc_core::(anonymous namespace)::XdsResolver::StartLocked()
        @ 0x564f4c2f5f82 grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartXdsResolver()
        @ 0x564f4c2f515d grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::ZoneQueryDone()
        @ 0x564f4c2f496b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()::{lambda()#1}::operator()()
        @ 0x564f4c2f80f6 std::__invoke_impl<>()
        @ 0x564f4c2f7b9d _ZSt10__invoke_rIvRZZN9grpc_core12_GLOBAL__N_124GoogleCloud2ProdResolver11StartLockedEvENUlNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN4absl12lts_202401168StatusOrIS8_EEE_clES8_SC_EUlvE_J...
        @ 0x564f4c2f748c std::_Function_handler<>::_M_invoke()
        @ 0x564f4b8ad682 std::function<>::operator()()
        @ 0x564f4cd1c6bf grpc_core::WorkSerializer::LegacyWorkSerializer::Run()
        @ 0x564f4cd1dae4 grpc_core::WorkSerializer::Run()
        @ 0x564f4c2f4b0b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()
        @ 0x564f4c2f8dc7 absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c2f8cb8 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c2f8b16 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c2f8a0c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4c2fb88d absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4c2fb1f3 grpc_core::GcpMetadataQuery::OnDone()
        @ 0x564f4cd75a72 exec_ctx_run()
        @ 0x564f4cd75ba9 grpc_core::ExecCtx::Flush()
        @ 0x564f4cc8ee1d end_worker()
        @ 0x564f4cc8f304 pollset_work()
        @ 0x564f4cc5dcaf pollset_work()
        @ 0x564f4cc69220 grpc_pollset_work()
        @ 0x564f4cbe7733 cq_pluck()
        @ 0x564f4cbe7ad5 grpc_completion_queue_pluck
        @ 0x564f4bc61d96 grpc::CompletionQueue::Pluck()
        @ 0x564f4bfdb055 grpc::ClientReader<>::ClientReader<>()
        @ 0x564f4bfd6035 grpc::internal::ClientReaderFactory<>::Create<>()
        @ 0x564f4bfc322b google::storage::v2::Storage::Stub::ReadObjectRaw()
        @ 0x564f4bf9934b google::storage::v2::Storage::Stub::ReadObject()

[mutex.cc : 1446] RAW: mutex@0x7f939834bb70 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4c1ce9eb grpc_core::(anonymous namespace)::RlsLb::RlsLb()::{lambda()#1}::operator()()
        @ 0x564f4c1e794c absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c1e72c1 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c1e6af1 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c1e5d6c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4be9d0c8 absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4be9b4ff grpc_core::RegisteredMetricCallback::Run()
        @ 0x564f4bea07ae grpc::internal::OpenTelemetryPluginImpl::CallbackGaugeState<>::CallbackGaugeCallback()
        @ 0x564f4bf844de opentelemetry::v1::sdk::metrics::ObservableRegistry::Observe()
        @ 0x564f4bf56529 opentelemetry::v1::sdk::metrics::Meter::Collect()
        @ 0x564f4bf8c1d5 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5ac opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5e8 opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::_FUN()
        @ 0x564f4bf7604d opentelemetry::v1::nostd::function_ref<>::operator()()
        @ 0x564f4bf74ad9 opentelemetry::v1::sdk::metrics::MeterContext::ForEachMeter()
        @ 0x564f4bf8c457 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()
        @ 0x564f4bf4a7fe opentelemetry::v1::sdk::metrics::MetricReader::Collect()
        @ 0x564f4bed5e24 opentelemetry::v1::exporter::metrics::PrometheusCollector::Collect()
        @ 0x564f4bef004f prometheus::detail::CollectMetrics()
        @ 0x564f4beec26d prometheus::detail::MetricsHandler::handleGet()
        @ 0x564f4bf1cd8b CivetServer::requestHandler()
        @ 0x564f4bf35e7b handle_request
        @ 0x564f4bf29534 handle_request_stat_log
        @ 0x564f4bf39b3f process_new_connection
        @ 0x564f4bf3a448 worker_thread_run
        @ 0x564f4bf3a57f worker_thread
        @ 0x7f93e9137ea7 start_thread

[mutex.cc : 1454] RAW: dying due to potential deadlock
Aborted
```

From the stack, it looks like we are ending up holding a lock to the
`RlsLB` policy while removing a callback from the gRPC OpenTelemetry
plugin, which is a lock ordering inversion. The correct order is
`OpenTelemetry` -> `gRPC OpenTelemetry plugin` -> `gRPC Component like
RLS/xDSClient`.

A common pattern we employ for metrics is for the callbacks to be
unregistered when the corresponding component object is
orphaned/destroyed (unreffing). Also, note that removing callbacks
requires a lock in `gRPC OpenTelemetry plugin`. To avoid deadlocks, we
remove the callback inside `RlsLb` from outside the critical region, but
`RlsLb` owns refs to child policies which in turn hold refs to
`XdsClient`. The lock ordering inversion occurred due to unreffing child
policies within the critical region.

This PR is an alternative fix to this problem. Original fix in
grpc#37425. Verified that it fixes the bug.
cb-robot pushed a commit that referenced this pull request Apr 23, 2025
…haned (grpc#37683)

Sample race - https://btx.cloud.google.com/invocations/0c4e65f2-3a38-4b4f-b67e-c53a4a4650ea/targets/%2F%2Ftest%2Fcore%2Fend2end:connectivity_test@poller%3Dpoll;config=2aed862ff4fd4384687d63aa95df415c7cb955355c2ab6dc6c6d7a9d123a76ec/log

```
WARNING: ThreadSanitizer: data race (pid=18)
  Write of size 8 at 0x72300000c318 by thread T29:
    #0 grpc_core::Chttp2ServerListener* std::__exchange(grpc_core::Chttp2ServerListener*&, grpc_core::Chttp2ServerListener*&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/move.h:152:13 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x68c85)
    #1 grpc_core::Chttp2ServerListener* std::exchange(grpc_core::Chttp2ServerListener*&, grpc_core::Chttp2ServerListener*&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/utility:287:14 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x68c05)
    #2 grpc_core::RefCountedPtr::reset(grpc_core::Chttp2ServerListener*) /proc/self/cwd/./src/core/lib/gprpp/ref_counted_ptr.h:126:20 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x68b32)
    #3 grpc_core::RefCountedPtr::operator=(grpc_core::RefCountedPtr&&) /proc/self/cwd/./src/core/lib/gprpp/ref_counted_ptr.h:66:5 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x54380)
    #4 grpc_core::Chttp2ServerListener::ActiveConnection::Start(grpc_core::RefCountedPtr, std::unique_ptr, grpc_core::ChannelArgs const&) /proc/self/cwd/src/core/ext/transport/chttp2/server/chttp2_server.cc:615:13 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x48914)
    #5 grpc_core::Chttp2ServerListener::OnAccept(void*, grpc_endpoint*, grpc_pollset*, grpc_tcp_server_acceptor*) /proc/self/cwd/src/core/ext/transport/chttp2/server/chttp2_server.cc:881:21 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x49ce2)
    #6 CreateEventEngineListener(grpc_tcp_server*, grpc_closure*, grpc_event_engine::experimental::EndpointConfig const&, grpc_tcp_server**)::$_2::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const /proc/self/cwd/src/core/lib/iomgr/tcp_server_posix.cc:228:11 (liblibiomgr.so+0xef627)
    #7 decltype(std::declval()(std::declval>>(), std::declval())) absl::lts_20240116::base_internal::Callable::Invoke>, grpc_event_engine::experimental::MemoryAllocator>(CreateEventEngineListener(grpc_tcp_server*, grpc_closure*, grpc_event_engine::experimental::EndpointConfig const&, grpc_tcp_server**)::$_2&, std::unique_ptr>&&, grpc_event_engine::experimental::MemoryAllocator&&) /proc/self/cwd/external/com_google_absl/absl/base/internal/invoke.h:185:12 (liblibiomgr.so+0xef3c2)
    #8 decltype(Invoker>, grpc_event_engine::experimental::MemoryAllocator>::type::Invoke(std::declval(), std::declval>>(), std::declval())) absl::lts_20240116::base_internal::invoke>, grpc_event_engine::experimental::MemoryAllocator>(CreateEventEngineListener(grpc_tcp_server*, grpc_closure*, grpc_event_engine::experimental::EndpointConfig const&, grpc_tcp_server**)::$_2&, std::unique_ptr>&&, grpc_event_engine::experimental::MemoryAllocator&&) /proc/self/cwd/external/com_google_absl/absl/base/internal/invoke.h:212:10 (liblibiomgr.so+0xef325)
    #9 void absl::lts_20240116::internal_any_invocable::InvokeR>, grpc_event_engine::experimental::MemoryAllocator, void>(CreateEventEngineListener(grpc_tcp_server*, grpc_closure*, grpc_event_engine::experimental::EndpointConfig const&, grpc_tcp_server**)::$_2&, std::unique_ptr>&&, grpc_event_engine::experimental::MemoryAllocator&&) /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:132:3 (liblibiomgr.so+0xef2b5)
    #10 void absl::lts_20240116::internal_any_invocable::LocalInvoker>, grpc_event_engine::experimental::MemoryAllocator>(absl::lts_20240116::internal_any_invocable::TypeErasedState*, absl::lts_20240116::internal_any_invocable::ForwardedParameter>>::type, absl::lts_20240116::internal_any_invocable::ForwardedParameter::type) /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:310:10 (liblibiomgr.so+0xef1e2)
    #11 absl::lts_20240116::internal_any_invocable::Impl>, grpc_event_engine::experimental::MemoryAllocator)>::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:868:1 (libsrc_Score_Slibposix_Uevent_Uengine.so+0xa754f)
    #12 grpc_event_engine::experimental::ThreadyEventEngine::CreateListener(absl::lts_20240116::AnyInvocable>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()::operator()() /proc/self/cwd/src/core/lib/event_engine/thready_event_engine/thready_event_engine.cc:61:15 (libsrc_Score_Slibthready_Uevent_Uengine.so+0x27cdb)
    #13 decltype(std::declval>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&>()()) absl::lts_20240116::base_internal::Callable::Invoke>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&>(grpc_event_engine::experimental::ThreadyEventEngine::CreateListener(absl::lts_20240116::AnyInvocable>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&) /proc/self/cwd/external/com_google_absl/absl/base/internal/invoke.h:185:12 (libsrc_Score_Slibthready_Uevent_Uengine.so+0x27c45)
    #14 decltype(Invoker>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&>::type::Invoke(std::declval>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&>())) absl::lts_20240116::base_internal::invoke>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&>(grpc_event_engine::experimental::ThreadyEventEngine::CreateListener(absl::lts_20240116::AnyInvocable>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&) /proc/self/cwd/external/com_google_absl/absl/base/internal/invoke.h:212:10 (libsrc_Score_Slibthready_Uevent_Uengine.so+0x27bf5)
    #15 void absl::lts_20240116::internal_any_invocable::InvokeR>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&, void>(grpc_event_engine::experimental::ThreadyEventEngine::CreateListener(absl::lts_20240116::AnyInvocable>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&) /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:132:3 (libsrc_Score_Slibthready_Uevent_Uengine.so+0x27ba5)
    #16 void absl::lts_20240116::internal_any_invocable::RemoteInvoker>, grpc_event_engine::experimental::MemoryAllocator)>, absl::lts_20240116::AnyInvocable, grpc_event_engine::experimental::EndpointConfig const&, std::unique_ptr>)::$_0::operator()(std::unique_ptr>, grpc_event_engine::experimental::MemoryAllocator) const::'lambda'()&>(absl::lts_20240116::internal_any_invocable::TypeErasedState*) /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:368:10 (libsrc_Score_Slibthready_Uevent_Uengine.so+0x279cd)
    #17 absl::lts_20240116::internal_any_invocable::Impl::operator()() /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:868:1 (libtest_Score_Send2end_Slibconnectivity_Ulibrary.so+0x337ff)
    #18 grpc_core::Thread::Thread(char const*, absl::lts_20240116::AnyInvocable, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::operator()(void*) const /proc/self/cwd/./src/core/lib/gprpp/thd.h:108:15 (libsrc_Score_Slibthready_Uevent_Uengine.so+0x2e264)
    #19 grpc_core::Thread::Thread(char const*, absl::lts_20240116::AnyInvocable, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) /proc/self/cwd/./src/core/lib/gprpp/thd.h:105:13 (libsrc_Score_Slibthready_Uevent_Uengine.so+0x2e1e9)
    #20 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::operator()(void*) const /proc/self/cwd/src/core/lib/gprpp/posix/thd.cc:148:11 (liblibgpr.so+0x1d830)
    #21 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) /proc/self/cwd/src/core/lib/gprpp/posix/thd.cc:118:9 (liblibgpr.so+0x1d659)

  Previous read of size 8 at 0x72300000c318 by main thread:
    #0 grpc_core::RefCountedPtr::operator!=(std::nullptr_t) const /proc/self/cwd/./src/core/lib/gprpp/ref_counted_ptr.h:192:50 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x52345)
    #1 grpc_core::Chttp2ServerListener::ActiveConnection::HandshakingState::~HandshakingState() /proc/self/cwd/src/core/ext/transport/chttp2/server/chttp2_server.cc:394:30 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x463ed)
    #2 std::enable_if::value, grpc_core::Chttp2ServerListener::ActiveConnection::HandshakingState*>::type grpc_event_engine::experimental::MemoryAllocator::New, grpc_pollset*&, std::unique_ptr, grpc_core::ChannelArgs const&>(grpc_core::RefCountedPtr&&, grpc_pollset*&, std::unique_ptr&&, grpc_core::ChannelArgs const&)::Wrapper::~Wrapper() /proc/self/cwd/include/grpc/event_engine/memory_allocator.h:117:65 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x67a1c)
    #3 std::enable_if::value, grpc_core::Chttp2ServerListener::ActiveConnection::HandshakingState*>::type grpc_event_engine::experimental::MemoryAllocator::New, grpc_pollset*&, std::unique_ptr, grpc_core::ChannelArgs const&>(grpc_core::RefCountedPtr&&, grpc_pollset*&, std::unique_ptr&&, grpc_core::ChannelArgs const&)::Wrapper::~Wrapper() /proc/self/cwd/include/grpc/event_engine/memory_allocator.h:117:27 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x67a59)
    #4 void grpc_core::UnrefDelete::operator()(grpc_core::Chttp2ServerListener::ActiveConnection::HandshakingState*) const /proc/self/cwd/./src/core/lib/gprpp/ref_counted.h:224:5 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x6449d)
    #5 grpc_core::InternallyRefCounted::Unref() /proc/self/cwd/./src/core/lib/gprpp/orphanable.h:132:7 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x52581)
    #6 grpc_core::Chttp2ServerListener::ActiveConnection::HandshakingState::Orphan() /proc/self/cwd/src/core/ext/transport/chttp2/server/chttp2_server.cc:407:3 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x466ac)
    #7 void grpc_core::OrphanableDelete::operator()(grpc_core::Chttp2ServerListener::ActiveConnection::HandshakingState*) /proc/self/cwd/./src/core/lib/gprpp/orphanable.h:60:8 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x653b1)
    #8 std::unique_ptr::~unique_ptr() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:292:4 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x53ddf)
    #9 grpc_core::Chttp2ServerListener::ActiveConnection::Orphan() /proc/self/cwd/src/core/ext/transport/chttp2/server/chttp2_server.cc:581:1 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x487db)
    #10 void grpc_core::OrphanableDelete::operator()(grpc_core::Chttp2ServerListener::ActiveConnection*) /proc/self/cwd/./src/core/lib/gprpp/orphanable.h:60:8 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x66b01)
    #11 std::unique_ptr::~unique_ptr() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:292:4 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x53d1f)
    #12 std::pair>::~pair() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_pair.h:208:12 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x597f9)
    #13 void __gnu_cxx::new_allocator>>>::destroy>>(std::pair>*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/ext/new_allocator.h:152:10 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x597c1)
    #14 void std::allocator_traits>>>>::destroy>>(std::allocator>>>&, std::pair>*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/alloc_traits.h:496:8 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x59725)
    #15 std::_Rb_tree>, std::_Select1st>>, std::less, std::allocator>>>::_M_destroy_node(std::_Rb_tree_node>>*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_tree.h:642:2 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x59674)
    #16 std::_Rb_tree>, std::_Select1st>>, std::less, std::allocator>>>::_M_drop_node(std::_Rb_tree_node>>*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_tree.h:650:2 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x595f9)
    #17 std::_Rb_tree>, std::_Select1st>>, std::less, std::allocator>>>::_M_erase(std::_Rb_tree_node>>*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_tree.h:1920:4 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x5945f)
    #18 std::_Rb_tree>, std::_Select1st>>, std::less, std::allocator>>>::~_Rb_tree() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_tree.h:1000:9 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x593c5)
    #19 std::map, std::less, std::allocator>>>::~map() /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/stl_map.h:300:22 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x51525)
    #20 grpc_core::Chttp2ServerListener::Orphan() /proc/self/cwd/src/core/ext/transport/chttp2/server/chttp2_server.cc:923:1 (libsrc_Score_Slibgrpc_Utransport_Uchttp2_Userver.so+0x4b5dd)
    #21 void grpc_core::OrphanableDelete::operator()(grpc_core::Server::ListenerInterface*) /proc/self/cwd/./src/core/lib/gprpp/orphanable.h:60:8 (libsrc_Score_Slibchaotic_Ugood_Userver.so+0x1d0981)
    #22 std::unique_ptr::reset(grpc_core::Server::ListenerInterface*) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/unique_ptr.h:402:4 (liblibserver.so+0x1d5c21)
    #23 grpc_core::Server::StopListening() /proc/self/cwd/src/core/server/server.cc:1211:23 (liblibserver.so+0x1b9c62)
    #24 grpc_core::Server::ShutdownAndNotify(grpc_completion_queue*, void*) /proc/self/cwd/src/core/server/server.cc:1195:3 (liblibserver.so+0x1b97da)
    #25 grpc_server_shutdown_and_notify /proc/self/cwd/src/core/server/server.cc:1829:37 (liblibserver.so+0x1bf212)
    #26 grpc_core::CoreEnd2endTest::ShutdownServerAndNotify(int) /proc/self/cwd/./test/core/end2end/end2end_tests.h:459:5 (libtest_Score_Send2end_Slibconnectivity_Ulibrary.so+0x33370)
    #27 grpc_core::(anonymous namespace)::CoreEnd2endTest_RetryHttp2Test_ConnectivityWatch::RunTest() /proc/self/cwd/test/core/end2end/tests/connectivity.cc:74:3 (libtest_Score_Send2end_Slibconnectivity_Ulibrary.so+0x2ee8d)
    #28 grpc_core::(anonymous namespace)::CoreEnd2endTest_RetryHttp2Test_ConnectivityWatch::TestBody() /proc/self/cwd/test/core/end2end/tests/connectivity.cc:32:1 (libtest_Score_Send2end_Slibconnectivity_Ulibrary.so+0x2dc96)
    #29 void testing::internal::HandleSehExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2612:10 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x16d2dc)
    #30 void testing::internal::HandleExceptionsInMethodIfSupported(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2648:14 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x14a51d)
    #31 testing::Test::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2687:5 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x120458)
    #32 testing::TestInfo::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2836:11 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x1215f3)
    #33 testing::TestSuite::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:3015:30 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x12230c)
    #34 testing::internal::UnitTestImpl::RunAllTests() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5921:44 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x138142)
    #35 bool testing::internal::HandleSehExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2612:10 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x17508f)
    #36 bool testing::internal::HandleExceptionsInMethodIfSupported(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2648:14 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x14e5b3)
    #37 testing::UnitTest::Run() /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:5485:10 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x13795b)
    #38 RUN_ALL_TESTS() /proc/self/cwd/external/com_google_googletest/googletest/include/gtest/gtest.h:2316:73 (libtest_Score_Send2end_Slibend2end_Utest_Umain.so+0x8457)
    #39 main /proc/self/cwd/test/core/end2end/end2end_test_main.cc:50:10 (libtest_Score_Send2end_Slibend2end_Utest_Umain.so+0x77b6)
```

We start the connection outside the critical region and that's where we supply the listener ref to the connection. There is a freak case where the connection can be orphaned due to the listener stopping to serve and the `Orphan()` would also be trying to access the listener ref resulting in a race.

Closes grpc#37683

COPYBARA_INTEGRATE_REVIEW=grpc#37683 from yashykt:FixChttp2ServerRace e3c4529
PiperOrigin-RevId: 681552145
cb-robot pushed a commit that referenced this pull request Apr 23, 2025
Remove tcp_tracer in chttp2_transport since it's unused and buggy.

Fix for TSAN failure (similar to grpc#38728)  -
https://btx.cloud.google.com/invocations/97eb07c8-cf0c-4c26-a0c4-e7417adb1f23/targets/%2F%2Ftest%2Fcpp%2Fext%2Fotel:otel_tracing_test@poller%3Dpoll;config=1a7b8f82b5a4a85323c9dfbac159b038ad684a9dcca8ab0ce8fe99aa87dd2da2/log

```
WARNING: ThreadSanitizer: data race (pid=15)
  Read of size 8 at 0x726c00001588 by thread T44:
    #0 GetContext /proc/self/cwd/./src/core/lib/resource_quota/arena.h:296:9 (liblibgrpc_Utransport_Uchttp2.so+0x30af6e)
    #1 (anonymous namespace)::TcpTracerIfSampled(grpc_chttp2_stream*) /proc/self/cwd/src/core/ext/transport/chttp2/transport/chttp2_transport.cc:243:17 (liblibgrpc_Utransport_Uchttp2.so+0x30af6e)
    #2 perform_stream_op_locked(void*, absl::lts_20240722::Status) /proc/self/cwd/src/core/ext/transport/chttp2/transport/chttp2_transport.cc:1655:19 (liblibgrpc_Utransport_Uchttp2.so+0x3014aa)
    #3 grpc_combiner_continue_exec_ctx() /proc/self/cwd/src/core/lib/iomgr/combiner.cc:216:5 (liblibexec_Uctx.so+0x10d7e)
    #4 grpc_core::ExecCtx::Flush() /proc/self/cwd/src/core/lib/iomgr/exec_ctx.cc:77:17 (liblibexec_Uctx.so+0x16e45)
    #5 queue_offload(grpc_core::Combiner*)::$_0::operator()() const /proc/self/cwd/src/core/lib/iomgr/combiner.cc:165:14 (liblibexec_Uctx.so+0x13092)
    #6 void std::__invoke_impl(std::__invoke_other, queue_offload(grpc_core::Combiner*)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:60:14 (liblibexec_Uctx.so+0x13005)
    #7 std::__invoke_result::type std::__invoke(queue_offload(grpc_core::Combiner*)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/invoke.h:95:14 (liblibexec_Uctx.so+0x12fb5)
    #8 std::invoke_result::type std::invoke(queue_offload(grpc_core::Combiner*)::$_0&) /usr/lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/functional:81:14 (liblibexec_Uctx.so+0x12f65)
    #9 void absl::lts_20240722::internal_any_invocable::InvokeR(queue_offload(grpc_core::Combiner*)::$_0&) /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:132:3 (liblibexec_Uctx.so+0x12f05)
    #10 void absl::lts_20240722::internal_any_invocable::LocalInvoker(absl::lts_20240722::internal_any_invocable::TypeErasedState*) /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:310:10 (liblibexec_Uctx.so+0x12e22)
    #11 absl::lts_20240722::internal_any_invocable::Impl::operator()() /proc/self/cwd/external/com_google_absl/absl/functional/internal/any_invocable.h:868:1 (libsrc_Score_Slibping_Ucallbacks.so+0x260bf)
    #12 grpc_event_engine::experimental::SelfDeletingClosure::Run() /proc/self/cwd/./src/core/lib/event_engine/common_closures.h:54:5 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x501bd)
    #13 grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::Step() /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:524:14 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4a949)
    #14 grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::ThreadBody() /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:487:10 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4a013)
    #15 grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::operator()(void*) const /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:257:17 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4b489)
    #16 grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::__invoke(void*) /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:255:7 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4b429)
    #17 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::operator()(void*) const /proc/self/cwd/src/core/util/posix/thd.cc:145:11 (liblibgpr.so+0x1c2e0)
    #18 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) /proc/self/cwd/src/core/util/posix/thd.cc:115:9 (liblibgpr.so+0x1c109)

  Previous write of size 8 at 0x726c00001588 by thread T47:
    #0 void grpc_core::Arena::SetContext(grpc_core::CallTracerInterface*) /proc/self/cwd/./src/core/lib/resource_quota/arena.h:305:10 (liblibgrpc_Uclient_Uchannel.so+0x5ac294)
    #1 grpc_core::(anonymous namespace)::CreateCallAttemptTracer(grpc_core::Arena*, bool) /proc/self/cwd/src/core/client_channel/client_channel_filter.cc:2388:10 (liblibgrpc_Uclient_Uchannel.so+0x58403c)
    #2 grpc_core::ClientChannelFilter::LoadBalancedCall::LoadBalancedCall(grpc_core::ClientChannelFilter*, grpc_core::Arena*, absl::lts_20240722::AnyInvocable, bool) /proc/self/cwd/src/core/client_channel/client_channel_filter.cc:2402:11 (liblibgrpc_Uclient_Uchannel.so+0x583a98)
    #3 grpc_core::ClientChannelFilter::FilterBasedLoadBalancedCall::FilterBasedLoadBalancedCall(grpc_core::ClientChannelFilter*, grpc_call_element_args const&, grpc_polling_entity*, grpc_closure*, absl::lts_20240722::AnyInvocable, bool) /proc/self/cwd/src/core/client_channel/client_channel_filter.cc:2628:7 (liblibgrpc_Uclient_Uchannel.so+0x5864d4)
    #4 grpc_core::ClientChannelFilter::FilterBasedLoadBalancedCall* grpc_core::Arena::New, bool&>(grpc_core::ClientChannelFilter*&&, grpc_call_element_args const&, grpc_polling_entity*&, grpc_closure*&, absl::lts_20240722::AnyInvocable&&, bool&) /proc/self/cwd/./src/core/lib/resource_quota/arena.h:181:13 (liblibgrpc_Uclient_Uchannel.so+0x59459d)
    #5 grpc_core::ClientChannelFilter::CreateLoadBalancedCall(grpc_call_element_args const&, grpc_polling_entity*, grpc_closure*, absl::lts_20240722::AnyInvocable, bool) /proc/self/cwd/src/core/client_channel/client_channel_filter.cc:1142:19 (liblibgrpc_Uclient_Uchannel.so+0x578343)
    #6 grpc_core::RetryFilter::LegacyCallData::CreateLoadBalancedCall(absl::lts_20240722::AnyInvocable, bool) /proc/self/cwd/src/core/client_channel/retry_filter_legacy_call_data.cc:1635:36 (liblibgrpc_Uclient_Uchannel.so+0x5f3954)
    #7 grpc_core::RetryFilter::LegacyCallData::CallAttempt::CallAttempt(grpc_core::RetryFilter::LegacyCallData*, bool) /proc/self/cwd/src/core/client_channel/retry_filter_legacy_call_data.cc:129:21 (liblibgrpc_Uclient_Uchannel.so+0x5f2de1)
    #8 grpc_core::RefCountedPtr grpc_core::MakeRefCounted(grpc_core::RetryFilter::LegacyCallData*&&, bool&) /proc/self/cwd/./src/core/util/ref_counted_ptr.h:369:31 (liblibgrpc_Uclient_Uchannel.so+0x60ad6a)
    #9 grpc_core::RetryFilter::LegacyCallData::CreateCallAttempt(bool) /proc/self/cwd/src/core/client_channel/retry_filter_legacy_call_data.cc:1644:19 (liblibgrpc_Uclient_Uchannel.so+0x605a9b)
    #10 grpc_core::RetryFilter::LegacyCallData::StartTransparentRetry(void*, absl::lts_20240722::Status) /proc/self/cwd/src/core/client_channel/retry_filter_legacy_call_data.cc:1937:12 (liblibgrpc_Uclient_Uchannel.so+0x60600b)
    #11 exec_ctx_run(grpc_closure*) /proc/self/cwd/src/core/lib/iomgr/exec_ctx.cc:43:3 (liblibexec_Uctx.so+0x17536)
    #12 grpc_core::ExecCtx::Flush() /proc/self/cwd/src/core/lib/iomgr/exec_ctx.cc:74:9 (liblibexec_Uctx.so+0x16e29)
    #13 grpc_core::ExecCtx::~ExecCtx() /proc/self/cwd/./src/core/lib/iomgr/exec_ctx.h:137:5 (liblibgrpc++_Ubase.so+0x1647cc)
    #14 grpc_core::WorkSerializer::WorkSerializerImpl::Run() /proc/self/cwd/src/core/util/work_serializer.cc:221:1 (liblibwork_Userializer.so+0x15947)
    #15 non-virtual thunk to grpc_core::WorkSerializer::WorkSerializerImpl::Run() /proc/self/cwd/src/core/util/work_serializer.cc (liblibwork_Userializer.so+0x15b09)
    #16 grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::Step() /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:524:14 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4a949)
    #17 grpc_event_engine::experimental::WorkStealingThreadPool::ThreadState::ThreadBody() /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:487:10 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4a013)
    #18 grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::operator()(void*) const /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:257:17 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4b489)
    #19 grpc_event_engine::experimental::WorkStealingThreadPool::WorkStealingThreadPoolImpl::StartThread()::$_0::__invoke(void*) /proc/self/cwd/src/core/lib/event_engine/thread_pool/work_stealing_thread_pool.cc:255:7 (libsrc_Score_Slibevent_Uengine_Uthread_Upool.so+0x4b429)
    #20 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::operator()(void*) const /proc/self/cwd/src/core/util/posix/thd.cc:145:11 (liblibgpr.so+0x1c2e0)
    #21 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*) /proc/self/cwd/src/core/util/posix/thd.cc:115:9 (liblibgpr.so+0x1c109)
```

Assumed sequence of events (based on stacktrace) -
* The client application creates a new RPC.
* A ClientChannelFilter::LoadBalancedCall is created for the first attempt.
* A CallAttemptTracer is created for this first attempt and a pointer to this is saved on the arena context.
* The first attempt fails without any bytes having been sent on the wire, making this RPC eligible for transparent retries.
* A new ClientChannelFilter::LoadBalancedCall created for this new attempt.
* A new CallAttemptTracer is created for this second (transparent) attempt and a pointer is saved on the arena context.
* Around the same time as a new CallAttemptTracer is created, the chttp2 transport is performing some operations for the first stream (for example, a cancel op).

Closes grpc#39015

COPYBARA_INTEGRATE_REVIEW=grpc#39015 from yashykt:FixTcpTracer 7e3746e
PiperOrigin-RevId: 739343300
cb-robot pushed a commit that referenced this pull request Apr 23, 2025
Backport grpc#37459 to 1.66
Internal bug: b/357864682

A lock ordering inversion was noticed with the following stacks -
```
[mutex.cc : 1418] RAW: Potential Mutex deadlock:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be968c5 grpc::internal::OpenTelemetryPluginImpl::RemoveCallback()
        @ 0x564f4cd097b8 grpc_core::RegisteredMetricCallback::~RegisteredMetricCallback()
        @ 0x564f4c1f1216 std::default_delete<>::operator()()
        @ 0x564f4c1f157f std::__uniq_ptr_impl<>::reset()
        @ 0x564f4c1ee967 std::unique_ptr<>::reset()
        @ 0x564f4c352f44 grpc_core::GrpcXdsClient::Orphaned()
        @ 0x564f4c25dad1 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c4653ed grpc_core::RefCountedPtr<>::reset()
        @ 0x564f4c463c73 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c463d02 grpc_core::XdsClusterDropStats::~XdsClusterDropStats()
        @ 0x564f4c25efa9 grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c25d5f0 grpc_core::RefCounted<>::Unref()
        @ 0x564f4c25c2d9 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c25b1d8 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c25b240 grpc_core::(anonymous namespace)::XdsClusterImplLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c14e958 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c14e980 grpc_core::(anonymous namespace)::OutlierDetectionLb::Picker::~Picker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()
        @ 0x564f4c124fb8 grpc_core::DualRefCounted<>::Unref()
        @ 0x564f4c11f029 grpc_core::RefCountedPtr<>::~RefCountedPtr()
        @ 0x564f4c26bafc std::pair<>::~pair()
        @ 0x564f4c26bb28 __gnu_cxx::new_allocator<>::destroy<>()
        @ 0x564f4c26b88f std::allocator_traits<>::destroy<>()
        @ 0x564f4c26b297 std::_Rb_tree<>::_M_destroy_node()
        @ 0x564f4c26abfb std::_Rb_tree<>::_M_drop_node()
        @ 0x564f4c26a926 std::_Rb_tree<>::_M_erase()
        @ 0x564f4c26a6f0 std::_Rb_tree<>::~_Rb_tree()
        @ 0x564f4c26a62a std::map<>::~map()
        @ 0x564f4c2691a4 grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c2691cc grpc_core::(anonymous namespace)::XdsClusterManagerLb::ClusterPicker::~ClusterPicker()
        @ 0x564f4c12c71a grpc_core::UnrefDelete::operator()<>()
        @ 0x564f4c1292ac grpc_core::DualRefCounted<>::WeakUnref()

[mutex.cc : 1428] RAW: Acquiring absl::Mutex 0x564f4f22ad40 while holding  0x7f939834bb70; a cycle in the historical lock ordering graph has been observed
[mutex.cc : 1432] RAW: Cycle:
[mutex.cc : 1446] RAW: mutex@0x564f4f22ad40 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4be96124 grpc::internal::OpenTelemetryPluginImpl::AddCallback()
        @ 0x564f4cd096f0 grpc_core::RegisteredMetricCallback::RegisteredMetricCallback()
        @ 0x564f4c1f111b std::make_unique<>()
        @ 0x564f4c3564b0 grpc_core::GlobalStatsPluginRegistry::StatsPluginGroup::RegisterCallback<>()
        @ 0x564f4c352dea grpc_core::GrpcXdsClient::GrpcXdsClient()
        @ 0x564f4c355bc6 grpc_core::MakeRefCounted<>()
        @ 0x564f4c3525f2 grpc_core::GrpcXdsClient::GetOrCreate()
        @ 0x564f4c28f8f8 grpc_core::(anonymous namespace)::XdsResolver::StartLocked()
        @ 0x564f4c2f5f82 grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartXdsResolver()
        @ 0x564f4c2f515d grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::ZoneQueryDone()
        @ 0x564f4c2f496b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()::{lambda()#1}::operator()()
        @ 0x564f4c2f80f6 std::__invoke_impl<>()
        @ 0x564f4c2f7b9d _ZSt10__invoke_rIvRZZN9grpc_core12_GLOBAL__N_124GoogleCloud2ProdResolver11StartLockedEvENUlNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN4absl12lts_202401168StatusOrIS8_EEE_clES8_SC_EUlvE_J...
        @ 0x564f4c2f748c std::_Function_handler<>::_M_invoke()
        @ 0x564f4b8ad682 std::function<>::operator()()
        @ 0x564f4cd1c6bf grpc_core::WorkSerializer::LegacyWorkSerializer::Run()
        @ 0x564f4cd1dae4 grpc_core::WorkSerializer::Run()
        @ 0x564f4c2f4b0b grpc_core::(anonymous namespace)::GoogleCloud2ProdResolver::StartLocked()::{lambda()#1}::operator()()
        @ 0x564f4c2f8dc7 absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c2f8cb8 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c2f8b16 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c2f8a0c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4c2fb88d absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4c2fb1f3 grpc_core::GcpMetadataQuery::OnDone()
        @ 0x564f4cd75a72 exec_ctx_run()
        @ 0x564f4cd75ba9 grpc_core::ExecCtx::Flush()
        @ 0x564f4cc8ee1d end_worker()
        @ 0x564f4cc8f304 pollset_work()
        @ 0x564f4cc5dcaf pollset_work()
        @ 0x564f4cc69220 grpc_pollset_work()
        @ 0x564f4cbe7733 cq_pluck()
        @ 0x564f4cbe7ad5 grpc_completion_queue_pluck
        @ 0x564f4bc61d96 grpc::CompletionQueue::Pluck()
        @ 0x564f4bfdb055 grpc::ClientReader<>::ClientReader<>()
        @ 0x564f4bfd6035 grpc::internal::ClientReaderFactory<>::Create<>()
        @ 0x564f4bfc322b google::storage::v2::Storage::Stub::ReadObjectRaw()
        @ 0x564f4bf9934b google::storage::v2::Storage::Stub::ReadObject()

[mutex.cc : 1446] RAW: mutex@0x7f939834bb70 stack:
        @ 0x564f4ce62fe5 absl::lts_20240116::DebugOnlyDeadlockCheck()
        @ 0x564f4ce632dc absl::lts_20240116::Mutex::Lock()
        @ 0x564f4be5886c absl::lts_20240116::MutexLock::MutexLock()
        @ 0x564f4c1ce9eb grpc_core::(anonymous namespace)::RlsLb::RlsLb()::{lambda()#1}::operator()()
        @ 0x564f4c1e794c absl::lts_20240116::base_internal::Callable::Invoke<>()
        @ 0x564f4c1e72c1 absl::lts_20240116::base_internal::invoke<>()
        @ 0x564f4c1e6af1 absl::lts_20240116::internal_any_invocable::InvokeR<>()
        @ 0x564f4c1e5d6c absl::lts_20240116::internal_any_invocable::LocalInvoker<>()
        @ 0x564f4be9d0c8 absl::lts_20240116::internal_any_invocable::Impl<>::operator()()
        @ 0x564f4be9b4ff grpc_core::RegisteredMetricCallback::Run()
        @ 0x564f4bea07ae grpc::internal::OpenTelemetryPluginImpl::CallbackGaugeState<>::CallbackGaugeCallback()
        @ 0x564f4bf844de opentelemetry::v1::sdk::metrics::ObservableRegistry::Observe()
        @ 0x564f4bf56529 opentelemetry::v1::sdk::metrics::Meter::Collect()
        @ 0x564f4bf8c1d5 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5ac opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::operator()()
        @ 0x564f4bf8c5e8 opentelemetry::v1::nostd::function_ref<>::BindTo<>()::{lambda()#1}::_FUN()
        @ 0x564f4bf7604d opentelemetry::v1::nostd::function_ref<>::operator()()
        @ 0x564f4bf74ad9 opentelemetry::v1::sdk::metrics::MeterContext::ForEachMeter()
        @ 0x564f4bf8c457 opentelemetry::v1::sdk::metrics::MetricCollector::Collect()
        @ 0x564f4bf4a7fe opentelemetry::v1::sdk::metrics::MetricReader::Collect()
        @ 0x564f4bed5e24 opentelemetry::v1::exporter::metrics::PrometheusCollector::Collect()
        @ 0x564f4bef004f prometheus::detail::CollectMetrics()
        @ 0x564f4beec26d prometheus::detail::MetricsHandler::handleGet()
        @ 0x564f4bf1cd8b CivetServer::requestHandler()
        @ 0x564f4bf35e7b handle_request
        @ 0x564f4bf29534 handle_request_stat_log
        @ 0x564f4bf39b3f process_new_connection
        @ 0x564f4bf3a448 worker_thread_run
        @ 0x564f4bf3a57f worker_thread
        @ 0x7f93e9137ea7 start_thread

[mutex.cc : 1454] RAW: dying due to potential deadlock
Aborted
```

From the stack, it looks like we are ending up holding a lock to the
`RlsLB` policy while removing a callback from the gRPC OpenTelemetry
plugin, which is a lock ordering inversion. The correct order is
`OpenTelemetry` -> `gRPC OpenTelemetry plugin` -> `gRPC Component like
RLS/xDSClient`.

A common pattern we employ for metrics is for the callbacks to be
unregistered when the corresponding component object is
orphaned/destroyed (unreffing). Also, note that removing callbacks
requires a lock in `gRPC OpenTelemetry plugin`. To avoid deadlocks, we
remove the callback inside `RlsLb` from outside the critical region, but
`RlsLb` owns refs to child policies which in turn hold refs to
`XdsClient`. The lock ordering inversion occurred due to unreffing child
policies within the critical region.

This PR is an alternative fix to this problem. Original fix in
grpc#37425. Verified that it fixes the bug.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant