-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
auth/store: save consistentIndex to fix a data corruption bug #11652
auth/store: save consistentIndex to fix a data corruption bug #11652
Conversation
Codecov Report
@@ Coverage Diff @@
## master #11652 +/- ##
==========================================
- Coverage 66.66% 66.24% -0.43%
==========================================
Files 401 402 +1
Lines 36629 36671 +42
==========================================
- Hits 24420 24291 -129
- Misses 10714 10862 +148
- Partials 1495 1518 +23
Continue to review full report at Codecov.
|
d97c8b0
to
7de3be9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your very detailed investigation and bugfix!! I think the PR is almost ready to be merged after fixing some style in the commit logs.
- The log of the 1st commit should be:
*: fix auth revision corruption bug
- The log of the 3rd commit should be:
auth: add new metric 'etcd_debugging_auth_revision'
- The 4th commit should be merged to the 1st one
BTW I think ^ on the second thought the counters aren't persisted so the only side effect caused by replaying would be granting again. |
@mitake thanks. i will optimize pr based on your suggestions. |
7de3be9
to
fc44b73
Compare
done, PTAL. |
Codecov Report
@@ Coverage Diff @@
## master #11652 +/- ##
==========================================
- Coverage 66.66% 66% -0.67%
==========================================
Files 401 402 +1
Lines 36629 36672 +43
==========================================
- Hits 24420 24204 -216
- Misses 10714 10966 +252
- Partials 1495 1502 +7
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for digging into this issue and fixing it!
@@ -69,11 +70,11 @@ type watchableStore struct { | |||
// cancel operations. | |||
type cancelFunc func() | |||
|
|||
func New(lg *zap.Logger, b backend.Backend, le lease.Lessor, ig ConsistentIndexGetter, cfg StoreConfig) ConsistentWatchableKV { | |||
return newWatchableStore(lg, b, le, ig, cfg) | |||
func New(lg *zap.Logger, b backend.Backend, le lease.Lessor, as auth.AuthStore, ig ConsistentIndexGetter, cfg StoreConfig) ConsistentWatchableKV { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think logically we want to pass ConsistentIndexGetter to NewAuthStore(), so that the auth store can get consistent index from etcdserver, and writes it to backend meta bucket at the end of applying a raft entry (such as adding a user, deleting a user, etc). Not sure if this means a lot of extra complexity in terms of implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. passing ConsistentIndexGetter to NewAuthStore will cause the saveIndex funtion to repeat.it looks a little ugly and violates the principle of function reuse. this is what we did in the first version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the second version, we define a public method SaveIndex in mvcc/store and define a interface ConsistentIndexSyncer which has a SaveIndex method that implemented by mvcc/store.SaveIndex. then, we want to pass ConsistentIndexSyncer interface to NewAuthStore().
What you see is the third version. After weighing the pros and cons, we choose the third version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for trying multiple implementations, and I understand maybe the current implementation makes the most sense coding wise. However here is my push back to the current implementation: auth store's correctness should not depend on passing the store pointer to mvcc. Auth store should be relatively independent of mvcc, although they share the same backend.
cc @mitake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, consistentIndex is a global concept. It has nothing to do with mvcc. It is better to separate it from mvcc/store and encapsulate it in a separate package. When mvcc, lessor, auth and other modules need to prevent repeated execution of commands, call consistentIndex package Save methods to store. What do you think of this solution? but I'm not sure how much complexity this approach will cause. which solution do you prefer? it is a cricital bug for us,so we take a simple and safe approach to fix it.
thanks. @jingyih @mitake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree with you! @tangcong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. For short term we can get this merged and backported. Could you add a TODO for encapsulating consistentindex into a separate package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. @jingyih
fc44b73
to
beb8d39
Compare
@tangcong I think the ideal way of updating consistent index would be writing data through mvcc, not directly access to backend. e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! defer to @jingyih
beb8d39
to
88b580d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for fixing this! I added few comments. LGTM after fix.
6ae860f
to
2c99a83
Compare
done. thanks. |
2c99a83
to
f14d2a0
Compare
fix issue #11651