-
Notifications
You must be signed in to change notification settings - Fork 4k
storage: test transactions that span merges #27149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/storage/client_merge_test.go, line 439 at r1 (raw file):
t.Fatal(err) } mtc.replicateRange(bDesc.RangeID, 1)
Add a paragraph explaining what these three lines achieve.
pkg/storage/client_merge_test.go, line 447 at r1 (raw file):
txn := client.NewTxn(store.DB(), 0 /* gatewayNodeID */, client.RootTxn) // Put the key on the RHS side first so the transaction record will need to // be copied to the LHS range during the merge.
Txn records don't get copied, they're keyed on txn.Key. You mean that it changes ownership?
pkg/storage/client_merge_test.go, line 454 at r1 (raw file):
t.Fatal(err) } args := adminMergeArgs(roachpb.KeyMin)
Why KeyMin?
pkg/storage/client_merge_test.go, line 488 at r1 (raw file):
// realize until after the merge. txn1 := client.NewTxn(store.DB(), 0 /* gatewayNodeID */, client.RootTxn) // Put the key on the RHS side so the transaction record and abort span
Ditto about txn record.
pkg/storage/client_merge_test.go, line 538 at r1 (raw file):
// Create a transaction that won't complete until after the merge. txn1 := client.NewTxn(store.DB(), 0 /* gatewayNodeID */, client.RootTxn) // Put the key on the RHS side so the transaction record and abort span
Ditto
pkg/storage/client_merge_test.go, line 552 at r1 (raw file):
// Complete the merge. time.Sleep(50 * time.Millisecond)
What's the Sleep for? To give txn2 time to get queued? I assume you needed this to get the test to fail without the queue clearing? Even on testrace? Add a comment.
pkg/storage/client_merge_test.go, line 569 at r1 (raw file):
} // Now that txn1 has commited, txn2's put operation should complete.
committed
pkg/storage/batcheval/cmd_end_transaction.go, line 989 at r1 (raw file):
} var ignoredMS enginepb.MVCCStats
These can't be ignored. By copying an abortspan into another nonempty and possibly overlapping abortspan you create a nontrivial stats delta. Unfortunately you're going to have to run some math here. Perhaps the code from the previous abort span copy that I ripped out at this very spot recently did this correctly and can be resuscitated (but I doubt it).
It would also be good to have a test for this, where the LHS and RHS both have an abortspan record for a given txn (plus some that are thrown away, etc), and you simply recompute the stats after the merge.
2086b56 to
9be8e23
Compare
benesch
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/storage/client_merge_test.go, line 439 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Add a paragraph explaining what these three lines achieve.
Done.
pkg/storage/client_merge_test.go, line 447 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Txn records don't get copied, they're keyed on
txn.Key. You mean that it changes ownership?
Ah, I'm referring to an implementation detail here. I mean that they get copied into a snapshot in GetSnapshotForMerge on the RHS and then applied to the LHS. Updated the wording to be more precise.
pkg/storage/client_merge_test.go, line 454 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Why KeyMin?
shrug it's what all the tests in this file use to address the first range. I've updated this test, at least, to be more clear.
pkg/storage/client_merge_test.go, line 488 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Ditto about txn record.
Done.
pkg/storage/client_merge_test.go, line 538 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Ditto
Done.
pkg/storage/client_merge_test.go, line 552 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
What's the Sleep for? To give txn2 time to get queued? I assume you needed this to get the test to fail without the queue clearing? Even on testrace? Add a comment.
Exactly. I've made this more robust by actually peeking into the txn wait queue.
pkg/storage/client_merge_test.go, line 569 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
committed
Done.
pkg/storage/batcheval/cmd_end_transaction.go, line 989 at r1 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
These can't be ignored. By copying an abortspan into another nonempty and possibly overlapping abortspan you create a nontrivial stats delta. Unfortunately you're going to have to run some math here. Perhaps the code from the previous abort span copy that I ripped out at this very spot recently did this correctly and can be resuscitated (but I doubt it).
It would also be good to have a test for this, where the LHS and RHS both have an abortspan record for a given txn (plus some that are thrown away, etc), and you simply recompute the stats after the merge.
D'oh. Good point. Fixed. I managed to avoid doing any math myself. :)
bdarnell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status:
complete! 0 of 0 LGTMs obtained
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale)
pkg/storage/batcheval/cmd_end_transaction.go, line 989 at r1 (raw file):
Previously, benesch (Nikhil Benesch) wrote…
D'oh. Good point. Fixed. I managed to avoid doing any math myself. :)
Did CI fail prior to the fix? (Hopefully it did, or we'll have to add something somewhere).
|
Yep, the additions I made to TestStoreRangeMergeStats reliably failed the test when I had the abort span stats computation wrong. |
|
Hilariously I just spent 15m debugging a stress failure here that immediately disappeared as soon as I rebased on top of #27061 (merge lock safety improvements). Looks like we've got now test coverage for that previously-theoretical issue! |
9be8e23 to
dca62f7
Compare
|
bors r=tschottdorf,bdarnell |
Merge conflict |
a244f37 to
166e666
Compare
Add tests to verify the correct behavior of transactions that span a merge. In the process, resolve TODOs about a) copying the abort span from the RHS to the LHS during a merge, and b) clearing the RHS's transaction wait queue.
166e666 to
9ea260f
Compare
|
bors r=tschottdorf,bdarnell |
27149: storage: test transactions that span merges r=tschottdorf,bdarnell a=benesch Add tests to verify the correct behavior of transactions that span a merge. In the process, resolve TODOs about a) copying the abort span from the RHS to the LHS during a merge, and b) clearing the RHS's transaction wait queue. Fixes #27091. Co-authored-by: Nikhil Benesch <nikhil.benesch@gmail.com>
Build succeeded |
Add tests to verify the correct behavior of transactions that span a
merge. In the process, resolve TODOs about a) copying the abort span
from the RHS to the LHS during a merge, and b) clearing the RHS's
transaction wait queue.
Fixes #27091.