Skip to content

perf: Remove unnecessary caches#5439

Merged
bthomee merged 41 commits intodevelopfrom
bthomee/disable-cache
Feb 6, 2026
Merged

perf: Remove unnecessary caches#5439
bthomee merged 41 commits intodevelopfrom
bthomee/disable-cache

Conversation

@bthomee
Copy link
Collaborator

@bthomee bthomee commented May 20, 2025

High Level Overview of Change

This change removes the cache in DatabaseNodeImp and simplifies the caching logic in SHAMapStoreImp.

Context of Change

The codebase contains many caches, and this PR is one of several to refactor the codebase with as goal to improve performance, such as reducing memory use, to support future account, trust line, and transaction growth. In this case. as NuDB and RocksDB internally already use caches, additional caches in the code are not very valuable or may even be unnecessary. A number of performance analyses demonstrated improved performance, and more extensive performance analyses will be performed before this change is released.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking change that only restructures code)
  • Performance (increase or change in throughput and/or latency)
  • Tests (you added tests for code that already exists, or your new feature included in this PR)
  • Documentation update
  • Chore (no impact to binary, e.g. .gitignore, formatting, dropping support for older tooling)
  • Release

@bthomee bthomee added the QE test required RippleX QE Team must look at this PR. label May 20, 2025
@bthomee bthomee marked this pull request as ready for review May 20, 2025 14:15
@bthomee bthomee requested review from a team and vlntb May 20, 2025 14:15
@codecov
Copy link

codecov bot commented May 20, 2025

Codecov Report

❌ Patch coverage is 29.62963% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.9%. Comparing base (25d7c2c) to head (32d1414).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
src/libxrpl/nodestore/DatabaseNodeImp.cpp 20.8% 19 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##           develop   #5439   +/-   ##
=======================================
  Coverage     79.9%   79.9%           
=======================================
  Files          840     840           
  Lines        65549   65483   -66     
  Branches      7266    7250   -16     
=======================================
- Hits         52352   52318   -34     
+ Misses       13197   13165   -32     
Files with missing lines Coverage Δ
include/xrpl/nodestore/Database.h 69.2% <ø> (ø)
include/xrpl/nodestore/detail/DatabaseNodeImp.h 72.7% <ø> (-3.7%) ⬇️
...nclude/xrpl/nodestore/detail/DatabaseRotatingImp.h 66.7% <ø> (ø)
src/libxrpl/nodestore/DatabaseRotatingImp.cpp 63.9% <ø> (+0.8%) ⬆️
src/xrpld/app/main/Application.cpp 70.9% <ø> (+0.1%) ⬆️
src/xrpld/app/misc/SHAMapStoreImp.cpp 76.2% <100.0%> (+0.4%) ⬆️
src/xrpld/app/misc/SHAMapStoreImp.h 95.7% <ø> (ø)
src/libxrpl/nodestore/DatabaseNodeImp.cpp 36.6% <20.8%> (+2.5%) ⬆️

... and 6 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PeterChen13579
Copy link
Contributor

Just looping in @kuznetsss before approving, I don’t believe we do any node-level DB caching in Clio, right? So no changes should be needed on our side?

@kuznetsss
Copy link
Collaborator

As I understand this is rippled's internal database. So no changes on Clio's side are required.

@bthomee bthomee added this to the 2.6.0 (Q3 2025) milestone May 27, 2025
@bthomee bthomee removed this from the 2.6.0 (Q3 2025) milestone Aug 13, 2025
@sublimator
Copy link
Contributor

I was looking into memory usago a few weeks ago too and noticed some oddities with the node store cache. Sharing my notes.


So regarding the NodeStore cache, like you, I'm of the opinion that it is actually largely pointless.

  • Peer messages use the SHAMap as source of truth and use a custom wire encoding
  • There's already a TreeNodeCache for SHAMap nodes that you may want to have laying around

Why was it ever there!? Well, it seems like back in the day, for some reason the TreeNodeCache was actually keyed by an id representing a node's position in the tree. This was also before NUDB. Think LevelDB and SQLite. So you had the tree node cache often missing, and a slow k/v store, so I'm guessing it was a backup cache.

It's kind of crazy, we read from the node store, then create a DecodedBlob from decompressed data, only to immediately just make a copy of that, and then cache that "because reasons"

Now days? Shrug. It seems you can disable it via setting cache_age=0 and cache_size=0 in the [node_db] configuration.


Attached Claude nodes when I did the investigation


NodeObject Cache Investigation

Summary of Findings

After investigating the NodeObject cache in rippled/xahaud, it appears to serve no clear purpose in the current architecture. Evidence suggests it might be a remnant from an earlier design that's now just adding overhead to node fetch operations. But without access to original design discussions, we can't be completely certain.

Historical Investigation

A Theory: TreeNodeCache Was Originally Keyed by NodeID

Looking at the July 2013 codebase (commit c645901), there's interesting evidence that might explain why the NodeObject cache exists. The code suggests there may have been a fundamental issue with how the TreeNodeCache was implemented.

What the 2013 Code Shows:

  1. The mTNByID Map (ripple_SHAMap.h:225):

    boost::unordered_map<SHAMapNode, SHAMapTreeNode::pointer> mTNByID;

    The map name itself - "Tree Nodes By ID" - suggests it was keyed by NodeID, not hash.

  2. The HashedObjectStore Cache (ripple_HashedObjectStore.h):

    class HashedObjectStore
    {
    public:
        HashedObjectStore (int cacheSize, int cacheAge);
        
        HashedObject::pointer retrieve (uint256 const& hash)
        {
            if (mLevelDB)
                return retrieveLevelDB (hash);
            return retrieveSQLite (hash);
        }
    
    private:
        TaggedCache<uint256, HashedObject, UptimeTimerAdapter> mCache;
        KeyCache <uint256, UptimeTimerAdapter> mNegativeCache;

    The HashedObjectStore had its own cache keyed by hash (uint256), separate from the TreeNodeCache.

  3. How getNode() Worked (ripple_SHAMap.cpp:204):

    SHAMapTreeNode::pointer node = checkCacheNode(id);  // Lookup by NodeID
    if (node) {
    #ifdef DEBUG
        if (node->getNodeHash() != hash)  // Then verify hash matches
            throw std::runtime_error("invalid node");
    #endif

    It seems to look up by NodeID first, then check if the hash matches - which could be problematic.

  4. The Fallback Path (ripple_SHAMap.cpp:820):

    SHAMapTreeNode::pointer SHAMap::fetchNodeExternalNT(const SHAMapNode& id, uint256 const& hash)
    {
        // When mTNByID misses...
        HashedObject::pointer obj(theApp->getHashedObjectStore().retrieve(hash));

    When the ID-based lookup failed, it fell back to a hash-based lookup in the HashedObjectStore.

  5. How HashedObjectStore Retrieved Objects (ripple_HashedObjectStore.cpp):

    HashedObject::pointer HashedObjectStore::retrieveLevelDB (uint256 const& hash)
    {
        HashedObject::pointer obj = mCache.fetch (hash);
        
        if (obj || mNegativeCache.isPresent (hash) || !theApp || !theApp->getHashNodeLDB ())
            return obj;
        
        // ... fetch from LevelDB if not in cache
        obj = LLRetrieve (hash, theApp->getHashNodeLDB ());
        
        if (!obj)
        {
            mNegativeCache.add (hash);
            return obj;
        }
        
        mCache.canonicalize (hash, obj);
        return obj;
    }

    The HashedObjectStore maintained its own cache using the hash as the key, which would succeed even when mTNByID missed.

This could have created an interesting problem:

  • The same content (same hash) might exist at different positions (different IDs) in different trees
  • An ID-based cache would miss when looking for that content at a different position
  • A hash-based cache would find it

Could NodeObject Cache Have Been a Workaround?

If the above theory is correct, the NodeObject cache (keyed by hash) might have been added to catch what the TreeNodeCache (keyed by ID) missed. This would explain why there were two caches doing seemingly similar things.

It's worth noting that in 2013, the backend databases (SQLite and LevelDB) would have been considerably slower than modern options like NuDB or RocksDB. Given these performance characteristics, having a secondary cache to avoid repeated database lookups when the TreeNodeCache missed would have made more sense.

What the 2014 Documentation Says

By July 2014, the team seemed aware of these issues. From the SHAMap README (commit ea44497):

"When we navigate the tree (say, like SHAMap::walkTo()) we currently ask each node for information that we could determine locally... The next refactor should remove all calls to SHAMapTreeNode::GetID()... Then we can change the SHAMap::mTNByID member to be mTNByHash."

This suggests they wanted to move away from ID-based lookups entirely.

Current State

Today's code uses direct pointer traversal:

std::shared_ptr<SHAMapTreeNode> node = parent->getChild(branch);
if (node || !backed_)
    return node;

It seems like the architecture has evolved significantly, but the NodeObject cache remains - possibly as a remnant of the old workaround.

Current Caching Layers

Looking at the current code, there appear to be three caching layers:

1. TreeNodeCache

  • Caches fully parsed tree nodes
  • Seems useful for historical ledger queries
  • Uses the modern hash-based lookup

2. NodeObject Cache

  • Caches raw serialized bytes WITHOUT the 9-byte prefix (the prefix is stripped during the copy)
  • Unclear what benefit it provides today
  • Forces an extra copy operation on every fetch to remove the prefix

3. In-Memory SHAMap

  • Direct pointer tree structure for current ledgers
  • Handles most operations without any cache lookups

Observed Inefficiencies

The Copy Problem

Every node fetch from disk appears to do an unnecessary copy:

// In DecodedBlob::createObject()
Blob data(m_objectData, m_objectData + m_dataBytes);  // COPY!

This happens because:

  • The decompressed data includes a 9-byte prefix
  • NodeObject doesn't want the prefix
  • So it copies everything except the first 9 bytes

Where NodeObject Cache Doesn't Seem to Help

  1. Current Ledger Operations: Use direct pointers, no cache needed
  2. Historical Queries: TreeNodeCache already has the parsed nodes
  3. P2P Syncing: Uses its own wire format
  4. Startup: Cache is empty, so it's just overhead
  5. RPC Queries: TreeNodeCache handles these better

Questions and Uncertainties

  • Was the NodeObject cache really a workaround for the ID-based TreeNodeCache? The evidence suggests it, but we can't be certain.
  • Are there any edge cases where the NodeObject cache provides value that we're missing?
  • Why wasn't it removed when the architecture changed? Fear of breaking something? Or does it serve some purpose we haven't identified?

Potential Impact of Removal

If the NodeObject cache truly serves no purpose, removing it could:

  • Eliminate the unnecessary copy on every disk read
  • Reduce memory usage
  • Simplify the code
  • Possibly improve performance

But without comprehensive testing, it's hard to be certain there aren't hidden dependencies.

Disabling the Cache

Interestingly, the NodeObject cache is already optional in the codebase. The cache initialization logic in DatabaseNodeImp.h shows how it can be disabled:

// Lines 46-76 from DatabaseNodeImp.h
std::optional<int> cacheSize, cacheAge;

if (config.exists("cache_size"))
{
    cacheSize = get<int>(config, "cache_size");
    if (cacheSize.value() < 0)
    {
        Throw<std::runtime_error>(
            "Specified negative value for cache_size");
    }
}

if (config.exists("cache_age"))
{
    cacheAge = get<int>(config, "cache_age");
    if (cacheAge.value() < 0)
    {
        Throw<std::runtime_error>(
            "Specified negative value for cache_age");
    }
}

if (cacheSize != 0 || cacheAge != 0)
{
    cache_ = std::make_shared<TaggedCache<uint256, NodeObject>>(
        "DatabaseNodeImp",
        cacheSize.value_or(0),
        std::chrono::minutes(cacheAge.value_or(0)),
        stopwatch(),
        j);
}

The key line is if (cacheSize != 0 || cacheAge != 0) - the cache is only created if at least one of these values is non-zero. Setting both to 0 leaves cache_ as nullptr.

Throughout DatabaseNodeImp.cpp, the code consistently checks if cache_ exists before using it:

// Line 55: storing objects
if (cache_ && !skipCache)

// Line 71: fetching objects  
if (cache_)

// Line 98: conditional fetch
cache_ ? cache_->fetch(hash) : nullptr

This means the cache can be disabled through configuration by setting both cache_age and cache_size to 0. When disabled:

  • All cache operations are skipped
  • The code falls back to direct backend access
  • No performance penalty from the unnecessary copy operation

This provides an easy way to test whether the cache provides any real benefit - just disable it and measure the impact.

Additional Problems: P2P Cache Pollution

Another major issue with the NodeObject cache is that peer requests pollute it with one-off data. When peers request nodes via TMGetObjectByHash, the code fetches from NodeStore, which adds the data to the cache even though it's quite possibly never going to be used again, especially if it's for a historical ledger.

This means if you have the cache configured, it will continuously grow with useless data every time peers request nodes from you. The cache fills with garbage that has near-zero probability of reuse, evicting any potentially useful entries. It's essentially uncontrolled memory growth triggered by external peer requests. See .ai-docs/nodeobject-cache-p2p-fix.md for a proposed solution.

Recommendations for Further Investigation

  1. Test with cache_size=0 and cache_age=0 to disable the cache completely
  2. Run benchmarks comparing performance with and without the cache
  3. Monitor memory usage and query latency in both configurations
  4. Look for any unexpected behavior when cache is disabled
  5. Investigate P2P request patterns to quantify cache pollution

The fact that the cache is already optional suggests the developers recognized it might not always be necessary. Testing with it disabled would be a low-risk way to validate whether it's truly vestigial.

@bthomee
Copy link
Collaborator Author

bthomee commented Aug 14, 2025

@sublimator I loved the historical context and the results from your deep dive - the amount of detail is very much appreciated!

As for your recommendations, our old performance analysis platform didn't let me modify the rippled.cfg, hence I created this PR in which I completely removed the code so I could evaluate its memory and cpu usage. It did show memory improvements at a slight increase in CPU. However, as our platform turned out to have some shortcomings (for one, it measured system memory rather than process memory, while it was being run on a machine in the cloud that can be shared with an arbitrary number of other services), we have been working on a new platform. This new platform is just about ready, so I'll be rerunning the experiment there to ensure there indeed performance improvements without any ill side-effects.

@sublimator
Copy link
Contributor

@bthomee

You're welcome, I assumed you could recontextualize/summarize it!
Was my Claude notes.

netOPs_ = &app_.getOPs();
ledgerMaster_ = &app_.getLedgerMaster();
fullBelowCache_ = &(*app_.getNodeFamily().getFullBelowCache());
treeNodeCache_ = &(*app_.getNodeFamily().getTreeNodeCache());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bthomee

It seems to be used by the SHAMap code itself. Tree node cache is more than a simple cache and is used for canonicalizing nodes. I was just deep diving into ripple'd shamap recently, and then I recalled this PR.

   void SHAMap::canonicalize(SHAMapHash const& hash, std::shared_ptr<SHAMapTreeNode>& node) const
   {
       assert(backed_);
       assert(node->cowid() == 0);  // Only canonical nodes can be cached
       assert(node->getHash() == hash);
   
       f_.getTreeNodeCache(ledgerSeq_)
           ->canonicalize_replace_client(hash.as_uint256(), node);
   }

It's what is used to make sure there's only one instance of a given node when structurally sharing trees.

All you're doing here is removing the SHAMapStoreImp reference, but maybe that clear() is actually useful ? I'm not sure when/why/how clearCaches() is being called, but it might actually be doing something beneficial to keeping memory usage down there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, wait, sorry it's calling freshenCache/freshenCaches

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    template <class CacheInstance>
    bool
    freshenCache(CacheInstance& cache)
    {
        std::uint64_t check = 0;

        for (auto const& key : cache.getKeys())
        {
            dbRotating_->fetchNodeObject(
                key, 0, NodeStore::FetchType::synchronous, true); // <- duplicate = true
            if (!(++check % checkHealthInterval_) && healthWait() == stopping)
                return true;
        }

        return false;
    }

By my reading 'freshenCache' is actually forcing any memory resident nodes to actually be 'duplicate=true'd written to the new backend during rotation. It's simply using the cache keys to determine what map nodes are in memory, rather than "freshen" that particular cache.

See:
https://github.com/XRPLF/rippled/blob/cb52c9af001316a906665a10283f6f5dd271ebb3/src/xrpld/nodestore/detail/DatabaseRotatingImp.cpp#L132C10-L189

Given you can't actually remove the tree node cache, as its relied upon for canonicalizing nodes, the question becomes one of whether you want the new backend to have those keys.

Copy link
Collaborator

@vlntb vlntb Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great research, @sublimator. I followed your notes and agree excluding (i.e. shrinking/zero-sizing) treeNodeCache_ can hurt rotation performance. The freshenCache pass uses the cache’s keys to warm the new backend—if the cache is empty, we lose that warm-start and we’ll see more cold misses immediately after rotation.

We should keep freshening the cache using the tree node cache, but we don’t need to keep it as a class member variable (dropping the member pointer doesn’t save memory—just fetch at the call site and pass *ptr).

The other realisation from your comment is that we’re currently trying to make the memory footprint of rippled smaller, which may lead to increased internal latency / decreased throughput—similar to the behaviour we see around DB rotation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great feedback and discussion - I restored the freshenCaches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just saw these comments, thanks :)

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 45 out of 48 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/libxrpl/nodestore/DatabaseNodeImp.cpp:90

  • results is declared const and then returned by value. If NRVO is not applied, this prevents moving and can force an extra vector copy on a potentially large batch. Consider dropping const (you can still avoid mutation by convention) so the return can be moved when needed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 3, 2026 16:20
@bthomee bthomee requested a review from ximinez February 3, 2026 16:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 73 to 80
auto results = backend_->fetchBatch(batch).first;
for (size_t i = 0; i < results.size(); ++i)
{
auto nObj = std::move(dbResults[i]);
size_t index = indexMap[cacheMisses[i]];
auto const& hash = hashes[index];

if (nObj)
{
// Ensure all threads get the same object
if (cache_)
cache_->canonicalize_replace_client(hash, nObj);
}
else
if (!results[i])
{
JLOG(j_.error()) << "fetchBatch - "
<< "record not found in db or cache. hash = " << strHex(hash);
if (cache_)
{
auto notFound = NodeObject::createObject(hotDUMMY, {}, hash);
cache_->canonicalize_replace_client(hash, notFound);
if (notFound->getType() != hotDUMMY)
nObj = std::move(notFound);
}
<< "record not found in db. hash = " << strHex(hashes[i]);
}
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetchBatch now returns backend_->fetchBatch(batch).first directly. This can return a vector whose size does not match hashes.size() (e.g., the NullBackend implementation returns an empty pair), which is a behavioral change from the prior code that always returned hashes.size() entries. Consider enforcing/normalizing the result size (and/or handling a non-ok status) so callers can safely assume positional correspondence with the input hashes.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert to confirm. However, implementations in MemoryFactory, NuDBFactory, and RocksDBFactory show that the same number of output objects are created as input hashes are provided.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an assert to confirm. However, implementations in MemoryFactory, NuDBFactory, and RocksDBFactory show that the same number of output objects are created as input hashes are provided.

I think @copilot is wrong here. The original code did a move assignment, which updates the size() of results to match whatever was returned. The initial size is essentially ignored.

You can see what I'm talking about with this unit test case snippet:

        std::vector<std::string> a{std::size_t{1024}};
        std::vector<std::string> b{std::size_t{2}};
        std::vector<std::string> c{std::size_t{2048}};
        BEAST_EXPECT(a.size() == 1024);
        BEAST_EXPECT(b.size() == 2);
        BEAST_EXPECT(c.size() == 2048);
        a = b;
        BEAST_EXPECT(a.size() == 2);
        BEAST_EXPECT(b.size() == 2);
        BEAST_EXPECT(c.size() == 2048);
        c = std::move(b);
        BEAST_EXPECT(a.size() == 2);
        BEAST_EXPECT(b.size() == 0);
        BEAST_EXPECT(c.size() == 2);

I'd go so far as to say that the assert is wrong, because as it said, NullBackend intentionally returns an empty vector.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just looked back again at the original code, and noticed that it guaranteed that result had the same size as hashes by creating it first, then copying the individual items over by index after canonicalizing. (I also realized that there was no bounds checking, so that if the backend_ returned a larger array, it would try to write past the end of results.)

To preserve that behavior, I suggest adding results.resize(hashes.size()); after the vector is created.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Note that there's an implicit assumption that results[i] refers t the same element as hashes[i]] (both in the new and old code). If the backend gives us back more or fewer results than input hashes, then perhaps it would be useful to raise a bigger stink about it - if the number of results is different, then how can we trust they are even in the right order?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that there's an implicit assumption that results[i] refers t the same element as hashes[i]] (both in the new and old code). If the backend gives us back more or fewer results than input hashes, then perhaps it would be useful to raise a bigger stink about it - if the number of results is different, then how can we trust they are even in the right order?

That's a good question, in both the old an new code. Perhaps have an assertion before resizing that results.size() == hashes.size() || results.empty(). If it's empty, then none of the items were found. Or do you think it should be a bigger stink like a throw or LogicError?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the assertion. I prefer to play it safe and find the error in the logs, than that all nodes on the network crash if we missed an edge case.

Copy link
Contributor

Copilot AI commented Feb 3, 2026

@ximinez I've opened a new pull request, #6323, to work on those changes. Once the pull request is ready, I'll request review from you.

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on existing threads

Copilot AI review requested due to automatic review settings February 3, 2026 22:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings February 4, 2026 19:01
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@ximinez ximinez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question, but either way this looks good.

// Also clear the FullBelowCache so its generation counter is bumped.
// This prevents stale "full below" markers from persisting across
// backend rotation/online deletion and interfering with SHAMap sync.
app_.getNodeFamily().getFullBelowCache()->clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you check that getFullBelowCache() doesn't return nullptr? I don't think the original implementation did.

Copy link
Collaborator Author

@bthomee bthomee Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the full below cache is created in the NodeFamily constructor, which in turn is created in the constructor of ApplicationImp, and that is called via make_Application in Main.cpp.

In no place do I see where the full below cache could become invalidated, since the only functions that can reach the cache are in the application, and the node family that holds the cache will only release it once the application is destroyed.

(If we should check for nullptr then there are many other places where it can go wrong too - see the freshenCaches function below.)

@bthomee bthomee added QE test required RippleX QE Team must look at this PR. and removed QE test required RippleX QE Team must look at this PR. labels Feb 5, 2026
@bthomee bthomee merged commit 677758b into develop Feb 6, 2026
27 checks passed
@bthomee bthomee deleted the bthomee/disable-cache branch February 6, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

QE test required RippleX QE Team must look at this PR. Triaged Issue/PR has been triaged for viability, liveliness, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants