-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix delete inconsistencies in read buffer #17195
Conversation
fa874cb
to
c6fc7bb
Compare
cc @ahrtr @serathius |
c6fc7bb
to
9600592
Compare
This PR is changing one of the critical parts of etcd, please let's breakdown it into two separate PRs. Each only fixes one problem. For each PR, the first commit only includes test cases to reproduce the issue you are going to fix in the 2nd or 3rd commits. Ensure you can reproduce the issue before you adding the 2nd or 3rd commit/patch, and can't reproduce it anymore after adding the patch. |
2828c9e
to
656fae1
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Agree with that, we need to make sure that each of those issues are properly understood and tested. Having a single PR for both adds unnecessary cognitive overhead. |
server/storage/backend/batch_tx.go
Outdated
if t.pending >= t.backend.batchLimit { | ||
// We need to commit the transaction if | ||
// 1. t.pending >= t.backend.batchLimit for performance, | ||
// because We don't want buffer to grow too big as it has worse complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's my current understanding, however would be good to confirm. Could you check PRs when the batchlimit was introduced and see what was the motivation back then. Referencing PR would be also useful.
cc @ahrtr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is one relevant pr #10283
checkForEachResponseMatch(t, b.BatchTx(), b.ReadTx(), schema.Test, nil, nil) | ||
} | ||
|
||
func checkRangeResponseMatch(t *testing.T, tx backend.BatchTx, rtx backend.ReadTx, key, endKey []byte, limit int64, expectedKeys, expectedValues [][]byte) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By adding validation of expected values you mixed assertion based testing with property based testing. Let's pick one of them.
Originally I implemented the test as property testing. The property I was testing was "ReadTx should always return the same response as BatchTX". This property should always hold, no matter what operations are executed before. I did it to allow make validation independent from test scenario and potentially randomize operations before validation.
Recent changes to the PR added expectedKeys
and expectedValues
testing, this means you hardeded the validation to specific cases. This is still correct, however the code can be much more simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In unit test, the test cases are hard coded. I think it is important to test the actual key and values in case ReadTx and BatchTX make the same mistake.
Changed the code to just check the results in checkForEach
656fae1
to
8c14477
Compare
@@ -55,6 +55,66 @@ func TestUserError(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestAddUserAfterDelete(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you confirmed that the test case can reproduce the issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed the test case if the defensive commit is removed. Because auth force commits for every change, this is not actually a problem for auth right now.
I am adding this test for completeness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, nice find. This defensive commit seems scary, similar to #17119 we noticed some inconsistency and didn't root cause it.
I think we should consider a followup where we add adding verifiers #17158 (comment), remove the defensive commit and see if we find anything.
923a035
to
1a249b1
Compare
1a249b1
to
906e962
Compare
/retest |
Signed-off-by: Siyuan Zhang <sizhang@google.com>
Signed-off-by: Siyuan Zhang <sizhang@google.com>
906e962
to
db61c96
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Could you backport the fix to 3.5 and 3.4? thx
Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
The PR includes fix for
(*batchTx) UnsafeDelete
may break the linearizability #17158