Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #5439 +/- ##
=======================================
Coverage 79.9% 79.9%
=======================================
Files 840 840
Lines 65549 65483 -66
Branches 7266 7250 -16
=======================================
- Hits 52352 52318 -34
+ Misses 13197 13165 -32
🚀 New features to boost your workflow:
|
|
Just looping in @kuznetsss before approving, I don’t believe we do any node-level DB caching in Clio, right? So no changes should be needed on our side? |
|
As I understand this is rippled's internal database. So no changes on Clio's side are required. |
|
I was looking into memory usago a few weeks ago too and noticed some oddities with the node store cache. Sharing my notes. So regarding the NodeStore cache, like you, I'm of the opinion that it is actually largely pointless.
Why was it ever there!? Well, it seems like back in the day, for some reason the TreeNodeCache was actually keyed by an id representing a node's position in the tree. This was also before NUDB. Think LevelDB and SQLite. So you had the tree node cache often missing, and a slow k/v store, so I'm guessing it was a backup cache. It's kind of crazy, we read from the node store, then create a DecodedBlob from decompressed data, only to immediately just make a copy of that, and then cache that "because reasons" Now days? Shrug. It seems you can disable it via setting cache_age=0 and cache_size=0 in the [node_db] configuration. Attached Claude nodes when I did the investigation NodeObject Cache InvestigationSummary of FindingsAfter investigating the NodeObject cache in rippled/xahaud, it appears to serve no clear purpose in the current architecture. Evidence suggests it might be a remnant from an earlier design that's now just adding overhead to node fetch operations. But without access to original design discussions, we can't be completely certain. Historical InvestigationA Theory: TreeNodeCache Was Originally Keyed by NodeIDLooking at the July 2013 codebase (commit c645901), there's interesting evidence that might explain why the NodeObject cache exists. The code suggests there may have been a fundamental issue with how the TreeNodeCache was implemented. What the 2013 Code Shows:
This could have created an interesting problem:
Could NodeObject Cache Have Been a Workaround?If the above theory is correct, the NodeObject cache (keyed by hash) might have been added to catch what the TreeNodeCache (keyed by ID) missed. This would explain why there were two caches doing seemingly similar things. It's worth noting that in 2013, the backend databases (SQLite and LevelDB) would have been considerably slower than modern options like NuDB or RocksDB. Given these performance characteristics, having a secondary cache to avoid repeated database lookups when the TreeNodeCache missed would have made more sense. What the 2014 Documentation SaysBy July 2014, the team seemed aware of these issues. From the SHAMap README (commit ea44497):
This suggests they wanted to move away from ID-based lookups entirely. Current StateToday's code uses direct pointer traversal: std::shared_ptr<SHAMapTreeNode> node = parent->getChild(branch);
if (node || !backed_)
return node;It seems like the architecture has evolved significantly, but the NodeObject cache remains - possibly as a remnant of the old workaround. Current Caching LayersLooking at the current code, there appear to be three caching layers: 1. TreeNodeCache
2. NodeObject Cache
3. In-Memory SHAMap
Observed InefficienciesThe Copy ProblemEvery node fetch from disk appears to do an unnecessary copy: // In DecodedBlob::createObject()
Blob data(m_objectData, m_objectData + m_dataBytes); // COPY!This happens because:
Where NodeObject Cache Doesn't Seem to Help
Questions and Uncertainties
Potential Impact of RemovalIf the NodeObject cache truly serves no purpose, removing it could:
But without comprehensive testing, it's hard to be certain there aren't hidden dependencies. Disabling the CacheInterestingly, the NodeObject cache is already optional in the codebase. The cache initialization logic in // Lines 46-76 from DatabaseNodeImp.h
std::optional<int> cacheSize, cacheAge;
if (config.exists("cache_size"))
{
cacheSize = get<int>(config, "cache_size");
if (cacheSize.value() < 0)
{
Throw<std::runtime_error>(
"Specified negative value for cache_size");
}
}
if (config.exists("cache_age"))
{
cacheAge = get<int>(config, "cache_age");
if (cacheAge.value() < 0)
{
Throw<std::runtime_error>(
"Specified negative value for cache_age");
}
}
if (cacheSize != 0 || cacheAge != 0)
{
cache_ = std::make_shared<TaggedCache<uint256, NodeObject>>(
"DatabaseNodeImp",
cacheSize.value_or(0),
std::chrono::minutes(cacheAge.value_or(0)),
stopwatch(),
j);
}The key line is Throughout // Line 55: storing objects
if (cache_ && !skipCache)
// Line 71: fetching objects
if (cache_)
// Line 98: conditional fetch
cache_ ? cache_->fetch(hash) : nullptrThis means the cache can be disabled through configuration by setting both
This provides an easy way to test whether the cache provides any real benefit - just disable it and measure the impact. Additional Problems: P2P Cache PollutionAnother major issue with the NodeObject cache is that peer requests pollute it with one-off data. When peers request nodes via This means if you have the cache configured, it will continuously grow with useless data every time peers request nodes from you. The cache fills with garbage that has near-zero probability of reuse, evicting any potentially useful entries. It's essentially uncontrolled memory growth triggered by external peer requests. See Recommendations for Further Investigation
The fact that the cache is already optional suggests the developers recognized it might not always be necessary. Testing with it disabled would be a low-risk way to validate whether it's truly vestigial. |
|
@sublimator I loved the historical context and the results from your deep dive - the amount of detail is very much appreciated! As for your recommendations, our old performance analysis platform didn't let me modify the |
|
You're welcome, I assumed you could recontextualize/summarize it! |
| netOPs_ = &app_.getOPs(); | ||
| ledgerMaster_ = &app_.getLedgerMaster(); | ||
| fullBelowCache_ = &(*app_.getNodeFamily().getFullBelowCache()); | ||
| treeNodeCache_ = &(*app_.getNodeFamily().getTreeNodeCache()); |
There was a problem hiding this comment.
It seems to be used by the SHAMap code itself. Tree node cache is more than a simple cache and is used for canonicalizing nodes. I was just deep diving into ripple'd shamap recently, and then I recalled this PR.
void SHAMap::canonicalize(SHAMapHash const& hash, std::shared_ptr<SHAMapTreeNode>& node) const
{
assert(backed_);
assert(node->cowid() == 0); // Only canonical nodes can be cached
assert(node->getHash() == hash);
f_.getTreeNodeCache(ledgerSeq_)
->canonicalize_replace_client(hash.as_uint256(), node);
}
It's what is used to make sure there's only one instance of a given node when structurally sharing trees.
All you're doing here is removing the SHAMapStoreImp reference, but maybe that clear() is actually useful ? I'm not sure when/why/how clearCaches() is being called, but it might actually be doing something beneficial to keeping memory usage down there.
There was a problem hiding this comment.
Oh, wait, sorry it's calling freshenCache/freshenCaches
There was a problem hiding this comment.
template <class CacheInstance>
bool
freshenCache(CacheInstance& cache)
{
std::uint64_t check = 0;
for (auto const& key : cache.getKeys())
{
dbRotating_->fetchNodeObject(
key, 0, NodeStore::FetchType::synchronous, true); // <- duplicate = true
if (!(++check % checkHealthInterval_) && healthWait() == stopping)
return true;
}
return false;
}By my reading 'freshenCache' is actually forcing any memory resident nodes to actually be 'duplicate=true'd written to the new backend during rotation. It's simply using the cache keys to determine what map nodes are in memory, rather than "freshen" that particular cache.
Given you can't actually remove the tree node cache, as its relied upon for canonicalizing nodes, the question becomes one of whether you want the new backend to have those keys.
There was a problem hiding this comment.
Great research, @sublimator. I followed your notes and agree excluding (i.e. shrinking/zero-sizing) treeNodeCache_ can hurt rotation performance. The freshenCache pass uses the cache’s keys to warm the new backend—if the cache is empty, we lose that warm-start and we’ll see more cold misses immediately after rotation.
We should keep freshening the cache using the tree node cache, but we don’t need to keep it as a class member variable (dropping the member pointer doesn’t save memory—just fetch at the call site and pass *ptr).
The other realisation from your comment is that we’re currently trying to make the memory footprint of rippled smaller, which may lead to increased internal latency / decreased throughput—similar to the behaviour we see around DB rotation.
There was a problem hiding this comment.
Thanks for the great feedback and discussion - I restored the freshenCaches.
There was a problem hiding this comment.
Just saw these comments, thanks :)
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 45 out of 48 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/libxrpl/nodestore/DatabaseNodeImp.cpp:90
resultsis declaredconstand then returned by value. If NRVO is not applied, this prevents moving and can force an extra vector copy on a potentially large batch. Consider droppingconst(you can still avoid mutation by convention) so the return can be moved when needed.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| auto results = backend_->fetchBatch(batch).first; | ||
| for (size_t i = 0; i < results.size(); ++i) | ||
| { | ||
| auto nObj = std::move(dbResults[i]); | ||
| size_t index = indexMap[cacheMisses[i]]; | ||
| auto const& hash = hashes[index]; | ||
|
|
||
| if (nObj) | ||
| { | ||
| // Ensure all threads get the same object | ||
| if (cache_) | ||
| cache_->canonicalize_replace_client(hash, nObj); | ||
| } | ||
| else | ||
| if (!results[i]) | ||
| { | ||
| JLOG(j_.error()) << "fetchBatch - " | ||
| << "record not found in db or cache. hash = " << strHex(hash); | ||
| if (cache_) | ||
| { | ||
| auto notFound = NodeObject::createObject(hotDUMMY, {}, hash); | ||
| cache_->canonicalize_replace_client(hash, notFound); | ||
| if (notFound->getType() != hotDUMMY) | ||
| nObj = std::move(notFound); | ||
| } | ||
| << "record not found in db. hash = " << strHex(hashes[i]); | ||
| } |
There was a problem hiding this comment.
fetchBatch now returns backend_->fetchBatch(batch).first directly. This can return a vector whose size does not match hashes.size() (e.g., the NullBackend implementation returns an empty pair), which is a behavioral change from the prior code that always returned hashes.size() entries. Consider enforcing/normalizing the result size (and/or handling a non-ok status) so callers can safely assume positional correspondence with the input hashes.
There was a problem hiding this comment.
Added an assert to confirm. However, implementations in MemoryFactory, NuDBFactory, and RocksDBFactory show that the same number of output objects are created as input hashes are provided.
There was a problem hiding this comment.
Added an assert to confirm. However, implementations in
MemoryFactory,NuDBFactory, andRocksDBFactoryshow that the same number of output objects are created as input hashes are provided.
I think @copilot is wrong here. The original code did a move assignment, which updates the size() of results to match whatever was returned. The initial size is essentially ignored.
You can see what I'm talking about with this unit test case snippet:
std::vector<std::string> a{std::size_t{1024}};
std::vector<std::string> b{std::size_t{2}};
std::vector<std::string> c{std::size_t{2048}};
BEAST_EXPECT(a.size() == 1024);
BEAST_EXPECT(b.size() == 2);
BEAST_EXPECT(c.size() == 2048);
a = b;
BEAST_EXPECT(a.size() == 2);
BEAST_EXPECT(b.size() == 2);
BEAST_EXPECT(c.size() == 2048);
c = std::move(b);
BEAST_EXPECT(a.size() == 2);
BEAST_EXPECT(b.size() == 0);
BEAST_EXPECT(c.size() == 2);
I'd go so far as to say that the assert is wrong, because as it said, NullBackend intentionally returns an empty vector.
There was a problem hiding this comment.
I just looked back again at the original code, and noticed that it guaranteed that result had the same size as hashes by creating it first, then copying the individual items over by index after canonicalizing. (I also realized that there was no bounds checking, so that if the backend_ returned a larger array, it would try to write past the end of results.)
To preserve that behavior, I suggest adding results.resize(hashes.size()); after the vector is created.
There was a problem hiding this comment.
Done.
Note that there's an implicit assumption that results[i] refers t the same element as hashes[i]] (both in the new and old code). If the backend gives us back more or fewer results than input hashes, then perhaps it would be useful to raise a bigger stink about it - if the number of results is different, then how can we trust they are even in the right order?
There was a problem hiding this comment.
Note that there's an implicit assumption that
results[i]refers t the same element ashashes[i]](both in the new and old code). If the backend gives us back more or fewer results than input hashes, then perhaps it would be useful to raise a bigger stink about it - if the number of results is different, then how can we trust they are even in the right order?
That's a good question, in both the old an new code. Perhaps have an assertion before resizing that results.size() == hashes.size() || results.empty(). If it's empty, then none of the items were found. Or do you think it should be a bigger stink like a throw or LogicError?
There was a problem hiding this comment.
I added the assertion. I prefer to play it safe and find the error in the logs, than that all nodes on the network crash if we missed an edge case.
ximinez
left a comment
There was a problem hiding this comment.
I left some comments on existing threads
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
ximinez
left a comment
There was a problem hiding this comment.
One question, but either way this looks good.
| // Also clear the FullBelowCache so its generation counter is bumped. | ||
| // This prevents stale "full below" markers from persisting across | ||
| // backend rotation/online deletion and interfering with SHAMap sync. | ||
| app_.getNodeFamily().getFullBelowCache()->clear(); |
There was a problem hiding this comment.
Should you check that getFullBelowCache() doesn't return nullptr? I don't think the original implementation did.
There was a problem hiding this comment.
I see that the full below cache is created in the NodeFamily constructor, which in turn is created in the constructor of ApplicationImp, and that is called via make_Application in Main.cpp.
In no place do I see where the full below cache could become invalidated, since the only functions that can reach the cache are in the application, and the node family that holds the cache will only release it once the application is destroyed.
(If we should check for nullptr then there are many other places where it can go wrong too - see the freshenCaches function below.)
High Level Overview of Change
This change removes the cache in
DatabaseNodeImpand simplifies the caching logic inSHAMapStoreImp.Context of Change
The codebase contains many caches, and this PR is one of several to refactor the codebase with as goal to improve performance, such as reducing memory use, to support future account, trust line, and transaction growth. In this case. as NuDB and RocksDB internally already use caches, additional caches in the code are not very valuable or may even be unnecessary. A number of performance analyses demonstrated improved performance, and more extensive performance analyses will be performed before this change is released.
Type of Change
.gitignore, formatting, dropping support for older tooling)