Skip to content

[release-7.4] Revert condition for releasing memory from tlog's versionLocation data structure #12298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 14, 2025

Conversation

spraza
Copy link
Collaborator

@spraza spraza commented Aug 13, 2025

Description

We introduced this change in #11815. The idea was that for tlog spill by reference, we should be using popped version, not persistent version, for the purposes of releasing memory from the logData->versionLocation data structure. This causes OOMs in remote tlogs under certain high traffic workloads after a multi-hour primary-remote disconnect i.e. after the disconnect is healed, remote tlogs' RSS footprint grows in an unbounded way, presumably because some storage server wasn't able to pop fast enough, causing memory build up in tlog.

This got us thinking whether we really need this change in the first place and whether it's safe to revert. After discussion with a few people, the understanding is that this should not cause a correctness issue since tlog restart can rebuild the state from disk. Joshua/simulation also have not shown this issue, we made the fix pre-emptively. Worst case potential cost is disk build up and that's because the versionLocation in-memory state is used for disk queue popping, and lack of the versions in memory could mean we don't release disk space fast enough. Having said that, we think even if it were to happen, this is transient and disk space would eventually be released because of how disk queue popping works.

Overall, we think this change is safe to revert.

Testing

  • Tested in a cluster that OOMs no longer happen.
  • 100K Joshua: 20250813-225250-praza-743-tag-v216-01060c4e-43f3c5618750bb80 compressed=True data_size=40160171 duration=5570120 ended=100000 fail=1 fail_fast=10 max_runs=100000 pass=99999 priority=100 remaining=0 runtime=1:19:41 sanity=False started=100000 stopped=20250814-001231 submitted=20250813-225250 timeout=5400 username=praza-743-tag-v216-01060c4ee1fb5386e045bb9c0b086f07315ac3ef

For the 1 Joshua failure, it's noSim which does not setup a simulated cluster, and I can not reproduce:

$ ~/cnd_build_output/bin/fdbserver -r test -f /root/src/foundationdb/tests/noSim/KeyValueStoreRocksDBTest.toml --buggify on --seed 2701483725 --logsize 1GiB
startingConfiguration: start
useDB: false
Run test:UnitTests start
setting up test (UnitTests)...
Test received trigger for setup...
running test (UnitTests)...
Found 5 tests
Testing noSim/fdbserver/KeyValueStoreRocksDB/RocksDBReopen
Testing noSim/fdbserver/KeyValueStoreRocksDB/IngestSSTFileVisibility
Testing noSim/fdbserver/KeyValueStoreRocksDB/CheckpointRestoreKeyValues
Testing noSim/fdbserver/KeyValueStoreRocksDB/RocksDBBasic
Testing noSim/fdbserver/KeyValueStoreRocksDB/CheckpointRestoreColumnFamily
UnitTests complete
UnitTests complete
checking test (UnitTests)...
fetching metrics (UnitTests)...
Metric (0, 0): UnitTests.Test Cases Available, 5.000000, 5
Metric (0, 1): UnitTests.Test Cases Executed, 5.000000, 5
Metric (0, 2): UnitTests.Test Cases Failed, 0.000000, 0
Metric (0, 3): UnitTests.Total wall clock time (s), 0.345422, 0.345
Metric (0, 4): UnitTests.Total flow time (s), 0.345016, 0.345
Test complete
1 test clients passed; 0 test clients failed
Run test:UnitTests Done.

1 tests passed; 0 tests failed.

I also looked at the stack trace mentioned in Joshua logs, does not seem related to this change:

$ llvm-addr2line -e ~/cnd_build_output/bin/fdbserver -p -C -f -i 0x7f9a6ac3bbf0 0x622d241 0x48a1b9b 0x622cfb9 0x622cdcd 0x622cd29 0x7f9a6ac8719a
?? at ??:0
operator() at /root/src/foundationdb/flow/IThreadPool.cpp:85
 (inlined by) complete<ThreadPool::ActionWrapper> at /opt/boost_1_86_0_clang/include/boost/asio/detail/handler_work.hpp:470
 (inlined by) boost::asio::detail::completion_handler<ThreadPool::ActionWrapper, boost::asio::io_context::basic_executor_type<std::__1::allocator<void>, 0ul>>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) at /opt/boost_1_86_0_clang/include/boost/asio/detail/completion_handler.hpp:74
rethrow_pending_exception at /opt/boost_1_86_0_clang/include/boost/asio/detail/thread_info_base.hpp:231
 (inlined by) boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) at /opt/boost_1_86_0_clang/include/boost/asio/detail/impl/scheduler.ipp:494
boost::asio::detail::scheduler::run_one(boost::system::error_code&) at /opt/boost_1_86_0_clang/include/boost/asio/detail/impl/scheduler.ipp:0
run_one at /opt/boost_1_86_0_clang/include/boost/asio/impl/io_context.ipp:0
 (inlined by) ThreadPool::Thread::run() at /root/src/foundationdb/flow/IThreadPool.cpp:54
ThreadPool::start(void*) at /root/src/foundationdb/flow/IThreadPool.cpp:66
?? at ??:0

Code-Reviewer Section

The general pull request guidelines can be found here.

Please check each of the following things and check all boxes before accepting a PR.

  • The PR has a description, explaining both the problem and the solution.
  • The description mentions which forms of testing were done and the testing seems reasonable.
  • Every function/class/actor that was touched is reasonably well documented.

For Release-Branches

If this PR is made against a release-branch, please also check the following:

  • This change/bugfix is a cherry-pick from the next younger branch (younger release-branch or main if this is the youngest branch)
  • There is a good reason why this PR needs to go into a release branch and this reason is documented (either in the description above or in a linked GitHub issue)

@spraza spraza marked this pull request as draft August 13, 2025 22:56
@spraza spraza changed the title Revert condition for releasing memory from tlog's versionLocation data structure [release-7.4] Revert condition for releasing memory from tlog's versionLocation data structure Aug 13, 2025
@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x

  • Commit ID: 01060c4
  • Duration 0:38:54
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang-arm on Linux CentOS 7

  • Commit ID: 01060c4
  • Duration 0:52:17
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-macos on macOS Ventura 13.x

  • Commit ID: 01060c4
  • Duration 1:02:23
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-clang on Linux RHEL 9

  • Commit ID: 01060c4
  • Duration 1:11:08
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr-cluster-tests on Linux RHEL 9

  • Commit ID: 01060c4
  • Duration 1:13:51
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)
  • Cluster Test Logs zip file of the test logs (available for 30 days)

@foundationdb-ci
Copy link
Contributor

Result of foundationdb-pr on Linux RHEL 9

  • Commit ID: 01060c4
  • Duration 1:16:00
  • Result: ✅ SUCCEEDED
  • Error: N/A
  • Build Log terminal output (available for 30 days)
  • Build Workspace zip file of the working directory (available for 30 days)

@spraza spraza requested a review from sbodagala August 14, 2025 05:15
@spraza spraza marked this pull request as ready for review August 14, 2025 05:15
@spraza spraza merged commit 55cb585 into apple:release-7.4 Aug 14, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants