Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(wip)raft: introduce term cache #141524

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

hakuuww
Copy link
Contributor

@hakuuww hakuuww commented Feb 14, 2025

Don't review. Prototype for the termCache.

More info here https://cockroachlabs.atlassian.net/wiki/x/MgB7AAE

References: #136296
Epic: None
Release note: None

This is a draft commit that adds a metric to record the latency for loading term if there is a cache miss.

Also adds separate metrics for raftRntry cache accesses from loadTerm.

Running roachtests against this build will help us with understanding the access patterns of raftEntry Cache, and understand how much performance improvement we may get when we introduce a term cache.

This commit is only for reference purpose, shouldn't be merged.

References: cockroachdb#136296
Epic: None
Release note: None
Copy link

blathers-crl bot commented Feb 14, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@hakuuww hakuuww force-pushed the introduceTermCacheV1InMemory branch 2 times, most recently from bdf7b88 to 369e311 Compare February 19, 2025 14:58
Prototype for the termCache.
And append logic.

References: cockroachdb#136296
Epic: None
Release note: None
@hakuuww hakuuww force-pushed the introduceTermCacheV1InMemory branch from 369e311 to 3eef8f5 Compare February 19, 2025 15:30
Passed all unit tests in kv/kvserver!

References: cockroachdb#136296
Epic: None
Release note: None
@hakuuww hakuuww force-pushed the introduceTermCacheV1InMemory branch 2 times, most recently from 05c0ccb to c1ed467 Compare February 20, 2025 00:38
@hakuuww
Copy link
Contributor Author

hakuuww commented Feb 20, 2025

updates:
tested initial implementation

ran into issues. tests in kv/kvserver caught truncation issues,

we initially thought truncation was not necessary for now

implemented truncation logic for termCache

passed all unit tests in kv/kvserver

also merged with the other branch with metrics, added specific metrics for termCache access and hits

you can now observe termCache accesses
no more raftentry cache accesses in steadystate for loadTerm
grafana

Copy link
Collaborator

@pav-kv pav-kv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to also clear/initialize the cache somewhere in applySnapshot stack - the log is cleared there.


// Term returns the entry term based on the given entry index
// Returns error if not in the termCache
func (tc *TermCache) Term(index uint64) (term uint64, err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably sufficient to return a bool instead of error. Don't need much of a granularity here - just found or not found.

Copy link
Contributor Author

@hakuuww hakuuww Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on whether we want the termCache to be a best effort attempt.
It does have potential to do more than it is doing rn.

In the case that the index we are asking for is for sure not in the raftLog according to the TermCache, we can return early, no more need to ask raftentry/cache and storage for term.
(think of the scenario: term cache has t1/11, t2/14, t4/20
if we ask for match term entryID: t3/15, it means there is no record of it being in the raftLog. so we should exit early. this is also why i wrote the extra match term function
)

I think asking for raftentry/cache and storage in this case will end up being a worst case access path, we won't find anything in the end. And it will take more time than we want.

I think it is certainly possible for this scenario to happen, when raft layer calls matchTerm() to validate an entry when we have inconsistent logs. Which is why we should implement the error here, for early exit.

Its risky though, if we don't get it right, its hard to reproduce the error, since usually logs are consistent. Its gonna require dedicated unit tests involving network partitions, etc.

Comment on lines 274 to 265
for i := 0; i < len(tc.cache)-1; i++ {
if index >= tc.cache[i].index && index < tc.cache[i+1].index {
return tc.cache[i].term, nil
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can scan from the back, and halve the slice accesses:

for i := len(tc.cache)-2; i >= 0; i-- {
	if index >= tc.cache[i].index {
		return tc.cache[i].term
	}
}


// Match returns whether the entryID is in the TermCache.
// If it is in the termCache, then it is in the raftLog.
func (tc *TermCache) Match(argEntryId entryID) (bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is unused. Remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful in the future?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually, if a function(ality) is not used immediately in the PR, or there is no clear vision on how it will be used in follow-up PRs, it should not be added.

For this Match method, do you expect any use? Seems like it's already covered by raftLog.matchTerm function, and having the Term() method is enough to support it.

Integrate with metrics from another branch cockroachdb#141159

References: cockroachdb#136296
Epic: None
Release note: None
@hakuuww hakuuww force-pushed the introduceTermCacheV1InMemory branch from c1ed467 to 40c99ef Compare February 20, 2025 21:12
Comment on lines +103 to +105
// TODO(hakuuww): actually, we are clearing more than hi here,
//
// which causes TermCache misses in steady state.
Copy link
Collaborator

@pav-kv pav-kv Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to fix this. Truncations are happening pretty much all the time, so we should not reduce coverage here.

2 options:

  • if a truncation index is in the middle of tc.cache[i], bump the tc.cache[i].index to match the truncation index.
  • leave the tc.cache[i] intact. There is no harm in remembering terms of entries below firstIndex.

(anthony)
btw just to be clear truncateFrom is not the problem. its clearTo that we are worrying about. (the wording we use is a bit confusing).

truncateFrom is removing suffix of the raftlog.
clearTo is removing prefix of the raftLog.

Copy link
Contributor Author

@hakuuww hakuuww Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pavel:


say you have a log [firstIndex=10] t1 t1 t1 t2 t2 t2
which is then truncated to start at index 12, so the new log is [firstIndex=12] t1 t2 t2 t2
the term cache before the change was: {index=10,term=1}, {index=13,term=2}
I’m saying that, after the truncation, we shouldn’t shrink the cache to be {index=13,term=2}. Because it’s unable to tell us the term of entry 12 that remains in the log.
Options:
make it {index=12,term=1}, {index=13,term=2}
leave it as {index=10,term=1}, {index=13,term=2}
Both options are able to answer the term for entry 12. Option 2 is slightly better because it doesn’t require rewriting the cache. So, looking forward, it will not require updating itself in storage.
4:55
it’s ok if the term cache remembers “more” history than the log itself. We wouldn’t be reading terms of entries below firstIndex-1

Inside termCache, we can pretty much assume that the caller has already made the necessary bound checks. The caller won’t be trying to access terms outside the log’s first-1/last index.


Copy link
Contributor Author

@hakuuww hakuuww Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: why don't we just not call clearTo at all, if it is ok to leave truncated entries in the termCache?

@hakuuww
Copy link
Contributor Author

hakuuww commented Feb 23, 2025

The prototype works as intended and greatly reduces entry cache accesses according to metrics, but there are additional details that we need to figure out.

Main problem:
For the initial prototype, we are truncating and clearing more indices than we necessarily need.

Originally, I thought all entries in the TermCache must represent term flip points is an invariant.

To always preserve this above invariant:
when we clear some entries up to(not including) a high hi index, we delete the underlying term flip entry with that symbolizes "flipped to the term at hi".

(when we truncate entries starting from and including 'lo', all entries below 'lo' are still preserved)
(when we clear entries up to 'hi', entries at or above 'hi' are preserved, entries below 'hi' are removed)

This means with an subsequent term look up at the termCache, it no longer thinks that the termCache has entries that are in the same term as in 'hi'.

Which leads to look ups in the lower level cache(raft entry cache) or storage.

clearTo operations happen all the time, because the storage engine is always persisting/applying entries. And telling the raftLog to stop storing them.
This is why we observe termCache misses in metrics.

We can eliminate those termCache misses if we modify the properties of our termCache:
"The cache[0] is not necessarily a term flip. All cache[i] with i >= 1 are term flips though."

Another 2 reasons why we want this has to do with:

  1. follower snapshot handling
    when the follower receives a snapshot, it would wipe the whole termCache.
    the lastIndex of the snapshot is likely not a termFlip, but we still want to add that lastIndex to the termCache at termCache[0]. indicating the log is in term termCache[0].term starting from termCache[0].index up to min(cache[i+1].index-1 /* if not nil*/, lastIndex)

  2. always have something in the TermCache even in cold restart
    when the process restarts, the termCache is empty, instead of waiting for the next entry append, we just store
    “prev” index/term of the whole raftLog, which we have access to in the hardstate in cold restart, which is good for initializing the cache.

those two scenarios in diagram:
use log.prev and append it to termCache[0]
image

Actually we are already not preserving the initial property, since we are appending whatever entry we first call append() upon, which is not necessarily a termFlip point.

But we still need to update the logic of the cache upon snapshot application and cold restart

@hakuuww
Copy link
Contributor Author

hakuuww commented Feb 23, 2025

TODO(hakuuww):

  • implement the two changes in the above diagram, related to initializing termCache with a first entry upon snapshot or restart.
    do it here after applying snapshot
    need to look into how to initialize the firstEntry to the firstIndex of truncatedstate

  • read more about go slice low level implementation. We are doing some copying that are not necessary. (ideally a circular array can eliminate copying since we are using a fixed size slice, but too overkill for a slice this short?)

  • when we append entries to the termCache, are we always doing it in increasing(entryID) order? I think they are, its good to leave a comment reminding future readers that this is true. Intuitively, on the raft level, entries are always appended to the raftLog into unstable in order, and taken from unstable to MsgStorageAppend, the original order is preserved.

@hakuuww
Copy link
Contributor Author

hakuuww commented Feb 24, 2025

Question: why don't we just not do clearTo at all, if it is ok to leave truncated entries in the termCache?

…d cold restart

References: cockroachdb#136296
Epic: None
Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants