(wip)raft: introduce term cache #141524

hakuuww · 2025-02-14T21:38:16Z

Don't review. Prototype for the termCache.

More info here https://cockroachlabs.atlassian.net/wiki/x/MgB7AAE

References: #136296
Epic: None
Release note: None

This is a draft commit that adds a metric to record the latency for loading term if there is a cache miss. Also adds separate metrics for raftRntry cache accesses from loadTerm. Running roachtests against this build will help us with understanding the access patterns of raftEntry Cache, and understand how much performance improvement we may get when we introduce a term cache. This commit is only for reference purpose, shouldn't be merged. References: cockroachdb#136296 Epic: None Release note: None

blathers-crl · 2025-02-14T21:38:19Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2025-02-14T21:38:26Z

This change is

Prototype for the termCache. And append logic. References: cockroachdb#136296 Epic: None Release note: None

Passed all unit tests in kv/kvserver! References: cockroachdb#136296 Epic: None Release note: None

hakuuww · 2025-02-20T00:47:42Z

updates:
tested initial implementation
↓
ran into issues. tests in kv/kvserver caught truncation issues,
↓
we initially thought truncation was not necessary for now
↓
implemented truncation logic for termCache
↓
passed all unit tests in kv/kvserver
↓
also merged with the other branch with metrics, added specific metrics for termCache access and hits
↓
you can now observe termCache accesses
no more raftentry cache accesses in steadystate for loadTerm
grafana

pav-kv

We need to also clear/initialize the cache somewhere in applySnapshot stack - the log is cleared there.

pav-kv · 2025-02-20T10:28:55Z

pkg/raft/termCache.go

+
+// Term returns the entry term based on the given entry index
+// Returns error if not in the termCache
+func (tc *TermCache) Term(index uint64) (term uint64, err error) {


Probably sufficient to return a bool instead of error. Don't need much of a granularity here - just found or not found.

Depends on whether we want the termCache to be a best effort attempt.
It does have potential to do more than it is doing rn.

In the case that the index we are asking for is for sure not in the raftLog according to the TermCache, we can return early, no more need to ask raftentry/cache and storage for term.
(think of the scenario: term cache has t1/11, t2/14, t4/20
if we ask for match term entryID: t3/15, it means there is no record of it being in the raftLog. so we should exit early. this is also why i wrote the extra match term function
)

I think asking for raftentry/cache and storage in this case will end up being a worst case access path, we won't find anything in the end. And it will take more time than we want.

I think it is certainly possible for this scenario to happen, when raft layer calls matchTerm() to validate an entry when we have inconsistent logs. Which is why we should implement the error here, for early exit.

Its risky though, if we don't get it right, its hard to reproduce the error, since usually logs are consistent. Its gonna require dedicated unit tests involving network partitions, etc.

pkg/raft/termCache.go

pav-kv · 2025-02-20T10:32:16Z

pkg/raft/termCache.go

+	for i := 0; i < len(tc.cache)-1; i++ {
+		if index >= tc.cache[i].index && index < tc.cache[i+1].index {
+			return tc.cache[i].term, nil
+		}
+	}


Can scan from the back, and halve the slice accesses:

for i := len(tc.cache)-2; i >= 0; i-- { if index >= tc.cache[i].index { return tc.cache[i].term } }

pav-kv · 2025-02-20T10:32:53Z

pkg/raft/termCache.go

+
+// Match returns whether the entryID is in the TermCache.
+// If it is in the termCache, then it is in the raftLog.
+func (tc *TermCache) Match(argEntryId entryID) (bool, error) {


This method is unused. Remove?

It may be useful in the future?

Usually, if a function(ality) is not used immediately in the PR, or there is no clear vision on how it will be used in follow-up PRs, it should not be added.

For this Match method, do you expect any use? Seems like it's already covered by raftLog.matchTerm function, and having the Term() method is enough to support it.

Integrate with metrics from another branch cockroachdb#141159 References: cockroachdb#136296 Epic: None Release note: None

pav-kv · 2025-02-20T21:39:12Z

pkg/raft/termCache.go

+// TODO(hakuuww): actually, we are clearing more than hi here,
+//
+//	which causes TermCache misses in steady state.


I think we need to fix this. Truncations are happening pretty much all the time, so we should not reduce coverage here.

2 options:

if a truncation index is in the middle of tc.cache[i], bump the tc.cache[i].index to match the truncation index.

leave the tc.cache[i] intact. There is no harm in remembering terms of entries below firstIndex.

(anthony)
btw just to be clear truncateFrom is not the problem. its clearTo that we are worrying about. (the wording we use is a bit confusing).

truncateFrom is removing suffix of the raftlog.
clearTo is removing prefix of the raftLog.

from pavel:

say you have a log [firstIndex=10] t1 t1 t1 t2 t2 t2
which is then truncated to start at index 12, so the new log is [firstIndex=12] t1 t2 t2 t2
the term cache before the change was: {index=10,term=1}, {index=13,term=2}
I’m saying that, after the truncation, we shouldn’t shrink the cache to be {index=13,term=2}. Because it’s unable to tell us the term of entry 12 that remains in the log.
Options:
make it {index=12,term=1}, {index=13,term=2}
leave it as {index=10,term=1}, {index=13,term=2}
Both options are able to answer the term for entry 12. Option 2 is slightly better because it doesn’t require rewriting the cache. So, looking forward, it will not require updating itself in storage.
4:55
it’s ok if the term cache remembers “more” history than the log itself. We wouldn’t be reading terms of entries below firstIndex-1

Inside termCache, we can pretty much assume that the caller has already made the necessary bound checks. The caller won’t be trying to access terms outside the log’s first-1/last index.

Question: why don't we just not call clearTo at all, if it is ok to leave truncated entries in the termCache?

hakuuww · 2025-02-23T22:31:36Z

The prototype works as intended and greatly reduces entry cache accesses according to metrics, but there are additional details that we need to figure out.

Main problem:
For the initial prototype, we are truncating and clearing more indices than we necessarily need.

Originally, I thought all entries in the TermCache must represent term flip points is an invariant.

To always preserve this above invariant:
when we clear some entries up to(not including) a high hi index, we delete the underlying term flip entry with that symbolizes "flipped to the term at hi".

(when we truncate entries starting from and including 'lo', all entries below 'lo' are still preserved)
(when we clear entries up to 'hi', entries at or above 'hi' are preserved, entries below 'hi' are removed)

This means with an subsequent term look up at the termCache, it no longer thinks that the termCache has entries that are in the same term as in 'hi'.

Which leads to look ups in the lower level cache(raft entry cache) or storage.

clearTo operations happen all the time, because the storage engine is always persisting/applying entries. And telling the raftLog to stop storing them.
This is why we observe termCache misses in metrics.

We can eliminate those termCache misses if we modify the properties of our termCache:
"The cache[0] is not necessarily a term flip. All cache[i] with i >= 1 are term flips though."

Another 2 reasons why we want this has to do with:

follower snapshot handling
when the follower receives a snapshot, it would wipe the whole termCache.
the lastIndex of the snapshot is likely not a termFlip, but we still want to add that lastIndex to the termCache at termCache[0]. indicating the log is in term termCache[0].term starting from termCache[0].index up to min(cache[i+1].index-1 /* if not nil*/, lastIndex)
always have something in the TermCache even in cold restart
when the process restarts, the termCache is empty, instead of waiting for the next entry append, we just store
“prev” index/term of the whole raftLog, which we have access to in the hardstate in cold restart, which is good for initializing the cache.

those two scenarios in diagram:
use log.prev and append it to termCache[0]

Actually we are already not preserving the initial property, since we are appending whatever entry we first call append() upon, which is not necessarily a termFlip point.

But we still need to update the logic of the cache upon snapshot application and cold restart

hakuuww · 2025-02-23T23:54:51Z

TODO(hakuuww):

implement the two changes in the above diagram, related to initializing termCache with a first entry upon snapshot or restart.
do it here after applying snapshot
need to look into how to initialize the firstEntry to the firstIndex of truncatedstate
read more about go slice low level implementation. We are doing some copying that are not necessary. (ideally a circular array can eliminate copying since we are using a fixed size slice, but too overkill for a slice this short?)
when we append entries to the termCache, are we always doing it in increasing(entryID) order? I think they are, its good to leave a comment reminding future readers that this is true. Intuitively, on the raft level, entries are always appended to the raftLog into unstable in order, and taken from unstable to MsgStorageAppend, the original order is preserved.

hakuuww · 2025-02-24T00:19:37Z

Question: why don't we just not do clearTo at all, if it is ok to leave truncated entries in the termCache?

…d cold restart References: cockroachdb#136296 Epic: None Release note: None

hakuuww force-pushed the introduceTermCacheV1InMemory branch 2 times, most recently from bdf7b88 to 369e311 Compare February 19, 2025 14:58

(wip)raft: introduce term cache

3eef8f5

Prototype for the termCache. And append logic. References: cockroachdb#136296 Epic: None Release note: None

hakuuww force-pushed the introduceTermCacheV1InMemory branch from 369e311 to 3eef8f5 Compare February 19, 2025 15:30

(wip)raft: introduce term cache

6754596

Passed all unit tests in kv/kvserver! References: cockroachdb#136296 Epic: None Release note: None

hakuuww force-pushed the introduceTermCacheV1InMemory branch 2 times, most recently from 05c0ccb to c1ed467 Compare February 20, 2025 00:38

pav-kv reviewed Feb 20, 2025

View reviewed changes

(wip)raft: introduce term cache

40c99ef

Integrate with metrics from another branch cockroachdb#141159 References: cockroachdb#136296 Epic: None Release note: None

hakuuww force-pushed the introduceTermCacheV1InMemory branch from c1ed467 to 40c99ef Compare February 20, 2025 21:12

pav-kv reviewed Feb 20, 2025

View reviewed changes

(wip)raft: update entry to last log entry upon processing snapshot an…

ec6005a

…d cold restart References: cockroachdb#136296 Epic: None Release note: None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(wip)raft: introduce term cache #141524

(wip)raft: introduce term cache #141524

hakuuww commented Feb 14, 2025 •

edited

Loading

blathers-crl bot commented Feb 14, 2025

cockroach-teamcity commented Feb 14, 2025

hakuuww commented Feb 20, 2025 •

edited

Loading

pav-kv left a comment

pav-kv Feb 20, 2025

hakuuww Feb 20, 2025 •

edited

Loading

pav-kv Feb 20, 2025

pav-kv Feb 20, 2025

hakuuww Feb 20, 2025

pav-kv Feb 20, 2025

pav-kv Feb 20, 2025 •

edited by hakuuww

Loading

hakuuww Feb 20, 2025 •

edited

Loading

hakuuww Feb 24, 2025 •

edited

Loading

hakuuww commented Feb 23, 2025 •

edited

Loading

hakuuww commented Feb 23, 2025 •

edited

Loading

hakuuww commented Feb 24, 2025 •

edited

Loading

(wip)raft: introduce term cache #141524

Are you sure you want to change the base?

(wip)raft: introduce term cache #141524

Conversation

hakuuww commented Feb 14, 2025 • edited Loading

blathers-crl bot commented Feb 14, 2025

cockroach-teamcity commented Feb 14, 2025

hakuuww commented Feb 20, 2025 • edited Loading

pav-kv left a comment

Choose a reason for hiding this comment

pav-kv Feb 20, 2025

Choose a reason for hiding this comment

hakuuww Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

pav-kv Feb 20, 2025

Choose a reason for hiding this comment

pav-kv Feb 20, 2025

Choose a reason for hiding this comment

hakuuww Feb 20, 2025

Choose a reason for hiding this comment

pav-kv Feb 20, 2025

Choose a reason for hiding this comment

pav-kv Feb 20, 2025 • edited by hakuuww Loading

Choose a reason for hiding this comment

hakuuww Feb 20, 2025 • edited Loading

Choose a reason for hiding this comment

hakuuww Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

hakuuww commented Feb 23, 2025 • edited Loading

hakuuww commented Feb 23, 2025 • edited Loading

hakuuww commented Feb 24, 2025 • edited Loading

hakuuww commented Feb 14, 2025 •

edited

Loading

hakuuww commented Feb 20, 2025 •

edited

Loading

hakuuww Feb 20, 2025 •

edited

Loading

pav-kv Feb 20, 2025 •

edited by hakuuww

Loading

hakuuww Feb 20, 2025 •

edited

Loading

hakuuww Feb 24, 2025 •

edited

Loading

hakuuww commented Feb 23, 2025 •

edited

Loading

hakuuww commented Feb 23, 2025 •

edited

Loading

hakuuww commented Feb 24, 2025 •

edited

Loading