-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: reading last snapshot in etcdutl migrate command #18010
base: main
Are you sure you want to change the base?
Conversation
Hi @fykaa. Thanks for your PR. I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
To improve correctness of etcdutl migrate command Signed-off-by: fykaa <faeka6@gmail.com>
Thanks @fykaa for the pr. Can you fix the tests? |
Hi @siyuanfoundation, yes, I am on it. Fixing some test bugs that occured for me. |
/retest |
@karuppiah7890: Cannot trigger testing until a trusted user reviews the PR and leaves an In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/ok-to-test |
etcdutl/etcdutl/migrate_command.go
Outdated
if err != nil { | ||
return nil, fmt.Errorf(`failed to find last snapshot: %v`, err) | ||
} | ||
w, err := wal.OpenForRead(c.lg, walPath, walpb.Snapshot{Index: lastSnapshot}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e2e failure logs: https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/etcd-io_etcd/18010/pull-etcd-e2e-amd64/1794986793938980864
The error seems to be coming from:
etcd/etcdutl/etcdutl/migrate_command.go
Line 105 in bb701b9
return nil, fmt.Errorf(`failed to read wal: %v`, err) |
Error excerpt:
Error Trace: /home/prow/go/src/github.com/etcd-io/etcd/tests/e2e/utl_migrate_test.go:162
Error: Error "[/home/prow/go/src/github.com/etcd-io/etcd/bin/etcdutl migrate --data-dir /tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0 --target-version 3.5] match not found. Set EXPECT_DEBUG for more info Errs: [unexpected exit code [128] after running [/home/prow/go/src/github.com/etcd-io/etcd/bin/etcdutl migrate --data-dir /tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0 --target-version 3.5]], last lines:\n2024-05-27T07:18:46Z\tinfo\tbbolt\tbackend/backend.go:197\tOpening db file (/tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0/member/snap/db) with mode 2d72772d2d2d2d2d2d2d and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: , ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc000130238}\r\n2024-05-27T07:18:46Z\tinfo\tbbolt\tbbolt@v1.4.0-alpha.1/db.go:321\tOpening bbolt db (/tmp/TestEtctlutlMigrateMigrate_v3.5_to_v3.5_is_no-op2858091090/002/member-0/member/snap/db) successfully\r\nError: failed to read wal: wal: snapshot mismatch\r\n (expected \"storage version up-to-date\\t{\\\"storage-version\\\": \\\"3.5\\\"}\", got []). Try EXPECT_DEBUG=TRUE" does not contain "storage version up-to-date\t{\"storage-version\": \"3.5\"}"
The error shown:
wal: snapshot mismatch
is returned only when indices match but Terms don't afaict, this is the error type:
etcd/server/storage/wal/wal.go
Line 60 in bb701b9
ErrSnapshotMismatch = errors.New("wal: snapshot mismatch") |
and that comes from here because we end up calling ReadAll
as part of ReadWALVersion
:
etcd/server/storage/wal/wal.go
Lines 496 to 499 in bb701b9
if snap.Term != w.start.Term { | |
state.Reset() | |
return nil, state, nil, ErrSnapshotMismatch | |
} |
w.start
here is ultimately the snapshot object that we pass into OpenForRead
:
etcd/server/storage/wal/wal.go
Line 353 in bb701b9
start: snap, |
I think we should be passing in the Term
here as well as part of walpb.Snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out the issue with the snapshot mismatch in the wal
package.
I've made some adjustments to implement your suggestion. The getLastSnapshot
function now returns the walpb.Snapshot
object, which includes full snapshot information (the Index
and Term
). This snapshot object is then passed directly to wal.OpenForRead
. I believe this should help mitigate the ErrSnapshotMismatch
issue.
Signed-off-by: fykaa <faeka6@gmail.com>
/retest |
@fykaa: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
if you print out the WAL version after
for the test |
ping any updates? |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fixes a part of #17227
In the
etcdutl/migrate_command.go
file, I've rectified a minor oversight in the logic for reading the last snapshot during the WAL migration process. Previously, an emptywalpb.Snapshot{}
was passed as a reference, that would result in reading of the WAL from the beginning.I will raise a follow-up PR that should also cover the addition of test cases, completing the solution for this issue