-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etcdserver: significantly reduces start-up time #11779
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11779 +/- ##
==========================================
- Coverage 66.65% 66.18% -0.47%
==========================================
Files 403 403
Lines 36881 36879 -2
==========================================
- Hits 24582 24408 -174
- Misses 10811 10969 +158
- Partials 1488 1502 +14
Continue to review full report at Codecov.
|
With 40 million key test data, this pr can reduce the startup time from 5 minutes to 2.5 minutes. @jingyih PTAL. thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
cc @gyuho |
This feels more like a bug... Let's backport this. As always, awesome work @tangcong ! Thanks. |
Can we also include this in CHANGELOG? :) |
73c8d71
to
3e64261
Compare
updated. |
etcdserver/backend.go
Outdated
kv := mvcc.New(cfg.Logger, oldbe, &lease.FakeLessor{}, ci, mvcc.StoreConfig{CompactionBatchLimit: cfg.CompactionBatchLimit}) | ||
defer kv.Close() | ||
if snapshot.Metadata.Index <= kv.ConsistentIndex() { | ||
if snapshot.Metadata.Index <= ci.ConsistentIndex() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have to make sure the backend has consistentIndexKeyName key,Otherwise, boltdb will throw an exception. recoverSnapshotBackend is only called when wal exists. I think it is safe. how do you think so? @gyuho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is called before we call mvcc.NewStore
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something like
if beExist {
kvindex := srv.kv.ConsistentIndex()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any exception that wal file exists but db does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something like
if beExist { kvindex := srv.kv.ConsistentIndex()
done,it will make recoverSnapshotBackend more robust,thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any exception that wal file exists but db does not?
I don't think so, unless user sets up a separate WAL dir, and deletes data dir intentionally.
@tangcong Ok, then let's just keep it in master. |
if there are too much keys(> 1 millions),etcd is slow to start. After my investigation, 90% of the time is spent on mvcc restore(rebuild index tree), 9% is spent on open backend db, and because the index tree has a global lock(idx.Insert), there is no room for optimization. however,restore index is called twice, it is unreasonable.we can benefit from pr #11699 and cut time in half.