Skip to content

Conversation

@duxiao1212
Copy link
Contributor

@duxiao1212 duxiao1212 commented Oct 28, 2025

Summary:
X-link: #26461

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

Added executor_.reset() to fix the flakness.

Task::addExchangeClientPool() creates leaf pool. BroadcastExchangeSource's async work on executor_ captures shared_ptr with buffers allocated from this pool. We need release the leaf pool before resetMemory to avoid arbitrary memory check failure.

The reference chain is: exchangeClient.0.0 [shared_ptr] → node.0 [shared_ptr] → query..2

Without the fix, the destructor of the leaf pool was handled asynchronously in another thread, so that exchangeClient.0.0 pool may be destroyed AFTER SharedArbitrator::shutdown(). executor_.reset() blocks and waits for all queued async work to complete, which includes the ExchangeSource cleanup that eventually destroys the exchangeClient.0.0 leaf pool. This ensures the pool hierarchy is fully cleaned up before OperatorTestBase::TearDown() runs SharedArbitrator::shutdown().

Reviewed By: xiaoxmeng

Differential Revision: D85685549

== NO RELEASE NOTE ==

Differential Revision: D85685549

@duxiao1212 duxiao1212 requested review from a team as code owners October 28, 2025 22:42
@prestodb-ci prestodb-ci added the from:Meta PR from Meta label Oct 28, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Oct 28, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Injects a delay and lifecycle barrier in BroadcastTest teardown to resolve a race in async cleanup and tidies up argument formatting in executeBroadcastRead.

File-Level Changes

Change Details Files
Add asynchronous cleanup delay in BroadcastTest teardown
  • Override TearDown to call waitForAllTasksToBeDeleted()
  • Insert std::this_thread::sleep_for(100ms) for async destructor completion
  • Reset pool_ and rootPool_, and call resetMemory() in teardown
presto-native-execution/presto_cpp/main/operators/tests/BroadcastTest.cpp
Refactor executeBroadcastRead argument indentation
  • Align asRowType, tempDirectoryPath, and broadcastFilePaths by indenting two levels
presto-native-execution/presto_cpp/main/operators/tests/BroadcastTest.cpp

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

duxiao1212 added a commit to duxiao1212/presto that referenced this pull request Oct 28, 2025
…restodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Since it's just a flaky test, the most straightforward way is just inject a latency to handle a race condition in async cleanup.

Differential Revision: D85685549
@duxiao1212 duxiao1212 changed the title [sv] Fix flaky test BroadcastTest.endToEndWithMultipleWriteNodes fix: Fix flaky test BroadcastTest.endToEndWithMultipleWriteNodes Oct 28, 2025
duxiao1212 added a commit to duxiao1212/presto that referenced this pull request Oct 28, 2025
…stodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Since it's just a flaky test, the most straightforward way is just inject a latency to handle a race condition in async cleanup.

Differential Revision: D85685549
duxiao1212 added a commit to duxiao1212/presto that referenced this pull request Oct 29, 2025
…stodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Since it's just a flaky test, the most straightforward way is just inject a latency to handle a race condition in async cleanup.

Differential Revision: D85685549
duxiao1212 added a commit to duxiao1212/presto that referenced this pull request Oct 29, 2025
…stodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Since it's just a flaky test, the most straightforward way is just inject a latency to handle a race condition in async cleanup.

Differential Revision: D85685549
duxiao1212 added a commit to duxiao1212/presto that referenced this pull request Oct 29, 2025
…stodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Since it's just a flaky test, the most straightforward way is just inject a latency to handle a race condition in async cleanup.

Differential Revision: D85685549
duxiao1212 added a commit to duxiao1212/presto that referenced this pull request Oct 29, 2025
…stodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Since it's just a flaky test, the most straightforward way is just inject a latency to handle a race condition in async cleanup.

Differential Revision: D85685549
@duxiao1212 duxiao1212 force-pushed the export-D85685549 branch 2 times, most recently from 88bf151 to 706aee6 Compare October 31, 2025 05:13
duxiao1212 added a commit to duxiao1212/presto that referenced this pull request Oct 31, 2025
…stodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Task::addExchangeClientPool() creates leaf pool. BroadcastExchangeSource's async work on executor_ captures shared_ptr<ExchangeSource> with buffers allocated from this pool. We need release the leaf pool before resetMemory to avoid arbitrary memory check failure.

Differential Revision: D85685549
xiaoxmeng
xiaoxmeng previously approved these changes Oct 31, 2025
…stodb#26461)

Summary:

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

We have an existing exchange source lifecycle protection for LocalExchangeSource https://fburl.com/code/aoz55crb.  But our BroadcastTest..endToEndWithMultipleWriteNodes uses BroadcastExchangeSource which lacks this protection.

Task::addExchangeClientPool() creates leaf pool. BroadcastExchangeSource's async work on executor_ captures shared_ptr<ExchangeSource> with buffers allocated from this pool. We need release the leaf pool before resetMemory to avoid arbitrary memory check failure.

Reviewed By: xiaoxmeng

Differential Revision: D85685549
@tdcmeehan tdcmeehan changed the title fix: Fix flaky test BroadcastTest.endToEndWithMultipleWriteNodes fix(ci): Fix flaky test BroadcastTest.endToEndWithMultipleWriteNodes Oct 31, 2025
duxiao1212 added a commit to duxiao1212/velox that referenced this pull request Oct 31, 2025
Summary:
X-link: prestodb/presto#26461

Fix flaky stress test buck2 test //github/presto-trunk/presto-native-execution/presto_cpp/main/operators/tests:presto_operators_test -- 'BroadcastTest.endToEndWithMultipleWriteNodes/zstd' --stress-runs 100

BroadcastTestParam::TearDown() inherited from OperatorTestBase https://www.internalfb.com/code/fbsource/[37f968940832]/fbcode/velox/exec/tests/utils/OperatorTestBase.cpp?lines=153 ,

Added  executor_.reset() to fix the flakness.

Task::addExchangeClientPool() creates leaf pool. BroadcastExchangeSource's async work on executor_ captures shared_ptr<ExchangeSource> with buffers allocated from this pool. We need release the leaf pool before resetMemory to avoid arbitrary memory check failure.

The reference chain is: exchangeClient.0.0 [shared_ptr] → node.0 [shared_ptr] → query..2

Without the fix, the destructor of the leaf pool was handled asynchronously in another thread, so that exchangeClient.0.0 pool may be destroyed AFTER SharedArbitrator::shutdown(). executor_.reset() blocks and waits for all queued async work to complete, which includes the ExchangeSource cleanup that eventually destroys the exchangeClient.0.0 leaf pool. This ensures the pool hierarchy is fully cleaned up before OperatorTestBase::TearDown() runs SharedArbitrator::shutdown().

Reviewed By: xiaoxmeng

Differential Revision: D85685549
duxiao1212 added a commit to duxiao1212/velox that referenced this pull request Oct 31, 2025
…bator#15344)

Summary:

X-link: prestodb/presto#26461

Reset executor before test teardown to ensure async work completes and releases 
memory pool references before SharedArbitrator shutdown checks.

The executor thread pool processes async tasks could hold references to leaf memory pools.
Without explicitly resetting the executor, these tasks complete after TearDown(), keeping the pool hierarchy alive during shutdown assertions, causing " Unexpected alive participants" errors.

Reviewed By: xiaoxmeng

Differential Revision: D85685549
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants