privilege: introduce 'partial update' for users and privileges cache (#62693) | tidb-test=pr/2700#66541
Conversation
|
@CbcWestwolf This PR has conflicts, I have hold it. |
|
@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
d1aea63 to
60c9344
Compare
📝 WalkthroughWalkthroughRefactors privilege notification and reload to support targeted per-user updates and full reloads. Adds Domain.PrivilegeEvent parsing/batching, introduces NotifyUpdateAllUsersPrivilege, changes NotifyUpdatePrivilege to accept a user list, updates privilege cache for partial updates, adjusts call sites (executor/session/BR), and adds bootstrap indices and a sysvar. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Executor
participant Domain
participant Etcd as "etcd/watch"
participant PrivHandle as "Privilege Handle"
participant MySQLPriv as "MySQLPrivilege"
Client->>Executor: DDL/RESTORE that affects users
Executor->>Domain: NotifyUpdatePrivilege(userList) or NotifyUpdateAllUsersPrivilege()
Domain->>Etcd: emit/watch PrivilegeEvent (ServerID, userList/all)
Etcd-->>Domain: watch events -> decodePrivilegeEvent / batchReadMoreData
Domain->>PrivHandle: deliver PrivilegeEvent (Update or UpdateAll)
PrivHandle->>MySQLPriv: Update(userList) or UpdateAll()
MySQLPriv->>MySQLPriv: load relevant tables (LoadRoleGraph, LoadUserTable, ...)
MySQLPriv->>PrivHandle: merge updated data
PrivHandle-->>Domain: handle updated
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
br/pkg/restore/snap_client/systable_restore.go (1)
383-407:⚠️ Potential issue | 🟠 MajorReload privileges after restoring any privilege table, not just
mysql.user.
tablescan be filtered down todb,tables_priv,columns_priv,default_roles,role_edges,global_priv, orglobal_grantswithoutuser. In those restores this branch never refreshes the in-memory privilege cache, so the restored grants stay stale until a manual flush. Gate this onsysPrivilegeTableMap(or reusenotifyUpdateAllUsersPrivilege) instead of the literal"user".💡 One way to make the reload condition complete
func (rc *SnapClient) afterSystemTablesReplaced(ctx context.Context, db string, tables []string) error { if db != mysql.SystemDB { return nil } var err error + needPrivilegeReload := false for _, table := range tables { - if table == "user" { - if serr := rc.dom.NotifyUpdateAllUsersPrivilege(); serr != nil { - log.Warn("failed to flush privileges, please manually execute `FLUSH PRIVILEGES`") - err = multierr.Append(err, berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to flush privileges")) - } else { - log.Info("privilege system table restored, please reconnect to make it effective") - } - } else if table == "bind_info" { + if _, ok := sysPrivilegeTableMap[table]; ok { + needPrivilegeReload = true + continue + } + if table == "bind_info" { if serr := rc.db.Session().Execute(ctx, bindinfo.StmtRemoveDuplicatedPseudoBinding); serr != nil { log.Warn("failed to delete duplicated pseudo binding", zap.Error(serr)) err = multierr.Append(err, berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to delete duplicated pseudo binding %s", bindinfo.StmtRemoveDuplicatedPseudoBinding)) } else { log.Info("success to remove duplicated pseudo binding") } } } + if needPrivilegeReload { + if serr := rc.dom.NotifyUpdateAllUsersPrivilege(); serr != nil { + log.Warn("failed to flush privileges, please manually execute `FLUSH PRIVILEGES`") + err = multierr.Append(err, berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to flush privileges")) + } else { + log.Info("privilege system table restored, please reconnect to make it effective") + } + } return err }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@br/pkg/restore/snap_client/systable_restore.go` around lines 383 - 407, The current afterSystemTablesReplaced method only calls rc.dom.NotifyUpdateAllUsersPrivilege() when the restored table name equals the literal "user", which misses other privilege-related tables (db, tables_priv, columns_priv, default_roles, role_edges, global_priv, global_grants) and leaves in-memory grants stale; update the condition to check whether any restored table is in the privilege table set (use sysPrivilegeTableMap or a helper like notifyUpdateAllUsersPrivilege) and call rc.dom.NotifyUpdateAllUsersPrivilege() when any privilege table is present (preserving existing logging and error aggregation behavior in afterSystemTablesReplaced) so all privilege restores trigger a refresh.pkg/executor/test/passwordtest/password_management_test.go (1)
919-928:⚠️ Potential issue | 🟡 MinorAssert the refresh error in
changeAutoLockedLastChanged.This helper updates
mysql.userand then refreshes the privilege cache. IfNotifyUpdateAllUsersPrivilege()fails here, the test silently continues with stale state and the later auth assertions can fail for the wrong reason.🧪 Make the helper return the refresh error
-func changeAutoLockedLastChanged(tk *testkit.TestKit, ds, user string) { +func changeAutoLockedLastChanged(tk *testkit.TestKit, ds, user string) error { SQL := "UPDATE `mysql`.`User` SET user_attributes=json_merge_patch(user_attributes, '{\"Password_locking\": {\"failed_login_attempts\": 3," + "\"password_lock_time_days\": 3,\"auto_account_locked\": \"Y\",\"failed_login_count\": 3,\"auto_locked_last_changed\": \"%s\"}}') " + "WHERE Host='localhost' and User='%s'" d, _ := time.ParseDuration(ds) changeTime := time.Now().Add(d).Format(time.UnixDate) SQL = fmt.Sprintf(SQL, changeTime, user) tk.MustExec(SQL) - domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() + return domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() }Then have each call site assert
require.NoError(t, changeAutoLockedLastChanged(...)). As per coding guidelines "Keep error handling actionable and contextual; avoid silently swallowing errors in Go code".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/test/passwordtest/password_management_test.go` around lines 919 - 928, changeAutoLockedLastChanged currently swallows any error from the privilege refresh; modify changeAutoLockedLastChanged to return an error and propagate any failure from domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() (i.e., change signature from func changeAutoLockedLastChanged(...) to func changeAutoLockedLastChanged(...) error and return the error if NotifyUpdateAllUsersPrivilege() fails after tk.MustExec), and then update every call site to assert the result (e.g., require.NoError(t, changeAutoLockedLastChanged(...))). Ensure function still performs the SQL update via tk.MustExec and only returns nil on successful NotifyUpdateAllUsersPrivilege().pkg/executor/show.go (1)
1876-1901:⚠️ Potential issue | 🟡 MinorEscape
authentication_stringbefore embedding it in the DDL.
authDatanow comes straight frommysql.user. If it contains'or\,SHOW CREATE USERwill emit invalid SQL forIDENTIFIED ... AS ....Suggested fix
authStr := "" if !(authplugin == mysql.AuthSocket && authData == "") { - authStr = fmt.Sprintf(" AS '%s'", authData) + authStr = fmt.Sprintf(" AS '%s'", format.OutputFormat(authData)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/show.go` around lines 1876 - 1901, The authData pulled into the DDL isn't escaped and can break the generated SQL; before building authStr in show.go (where authData, authplugin and showStr are composed), properly escape authentication_string characters (at minimum single quote and backslash, and any other SQL-string-special bytes used by your SQL dialect) or use an existing SQL/string-quoting helper to produce a safely quoted literal, then embed that escaped/quoted value into authStr (preserving the existing skip for mysql.AuthSocket && authData == ""); update the construction that uses fmt.Sprintf for showStr to use the escaped/quoted authData instead of the raw authData.
🧹 Nitpick comments (4)
pkg/extension/auth_test.go (1)
238-238: Avoid asserting the total mock count across the whole test.This expectation is coupled to every earlier
ValidateAuthStringside effect in the test, so unrelated auth/cache changes will keep forcing churn here. Prefer asserting the delta for the login step, or reset the mock state before this section.As per coding guidelines,
**/{*_test.go,testdata/**}: Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/extension/auth_test.go` at line 238, The test currently asserts total calls with p.AssertNumberOfCalls(t, "ValidateAuthString", 3) which couples this assertion to prior side effects; instead, either reset or clear the mock before the login section (so subsequent calls can be asserted deterministically) or replace the global count assertion with a delta-style check that only verifies the number of ValidateAuthString calls made during the login step (e.g., capture the current call count from the mock, perform the login actions, then assert the increment equals the expected delta). Target the mock method name ValidateAuthString and the assertion call p.AssertNumberOfCalls to implement the change.pkg/sessionctx/variable/tidb_vars.go (1)
1033-1034: Document the trade-off behind this switch a bit more explicitly.The current comment says what flips, but not why someone would enable it or why the default is off. A short note that this trades higher in-memory user-cache footprint for faster privilege/user refreshes would make the knob much easier to reason about.
As per coding guidelines,
**/*.go: Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sessionctx/variable/tidb_vars.go` around lines 1033 - 1034, Update the comment for the TiDBAccelerateUserCreationUpdate constant to explain the trade-off: when enabled (TiDBAccelerateUserCreationUpdate) the server eagerly loads and updates whole user data in-memory to speed up privilege/user refreshes, at the cost of increased memory footprint for the user cache; clarify that the default is off to avoid higher memory usage in typical deployments and note any relevant invariants (e.g., affects in-memory user-cache size and refresh latency).pkg/sessionctx/variable/sysvar.go (1)
3557-3562: Consider addingGetGlobalfor consistency with other atomic-backed global variables.The variable uses
AccelerateUserCreationUpdate.Store()inSetGlobal, but lacks a correspondingGetGlobal. Other similar global boolean variables with atomic backing (e.g.,TiDBEnableBatchDML,TiDBEnableCheckConstraint,TiDBEnableResourceControl) follow the pattern of having bothSetGlobalandGetGlobalto ensure reads reflect the in-memory atomic state.Proposed fix
{Scope: ScopeGlobal, Name: TiDBAccelerateUserCreationUpdate, Value: BoolToOnOff(DefTiDBAccelerateUserCreationUpdate), Type: TypeBool, SetGlobal: func(_ context.Context, s *SessionVars, val string) error { AccelerateUserCreationUpdate.Store(TiDBOptOn(val)) return nil }, + GetGlobal: func(_ context.Context, s *SessionVars) (string, error) { + return BoolToOnOff(AccelerateUserCreationUpdate.Load()), nil + }, },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sessionctx/variable/sysvar.go` around lines 3557 - 3562, Add a GetGlobal to mirror the existing SetGlobal for the TiDBAccelerateUserCreationUpdate variable so reads reflect the in-memory atomic state: implement a GetGlobal function for the variable that calls AccelerateUserCreationUpdate.Load(), converts the boolean to the string form with BoolToOnOff (matching the existing Value/SetGlobal pattern), and returns that string (with nil error) so GetGlobal and SetGlobal are consistent with other atomic-backed globals like TiDBEnableBatchDML/TiDBEnableCheckConstraint/TiDBEnableResourceControl.pkg/session/bootstrap.go (1)
388-389: Drop the redundantmysql.global_grantssecondary index.
mysql.global_grantsalready hasPRIMARY KEY (USER, HOST, PRIV), soi_user(USER)does not add a new access path; it just adds extra DDL, storage, and write overhead on grant changes.♻️ Suggested cleanup
- PRIMARY KEY (USER,HOST,PRIV), - KEY i_user (USER) + PRIMARY KEY (USER,HOST,PRIV) );`- doReentrantDDL(s, "ALTER TABLE mysql.global_grants ADD INDEX i_user (user)", dbterror.ErrDupKeyName)Also applies to: 3340-3340
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/session/bootstrap.go` around lines 388 - 389, The DDL defines a redundant secondary index KEY i_user (USER) on mysql.global_grants which duplicates the PRIMARY KEY (USER,HOST,PRIV); remove the secondary index declaration to avoid extra DDL/storage/write overhead. Edit the CREATE/ALTER statement(s) in pkg/session/bootstrap.go that reference mysql.global_grants and delete the "KEY i_user (USER)" entry (and any identical occurrences elsewhere in that file, e.g., the other repeated definition), leaving only the PRIMARY KEY (USER,HOST,PRIV).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/domain/domain.go`:
- Around line 1869-1901: The function decodePrivilegeEvent must treat any watch
event with an empty value as a full refresh; modify decodePrivilegeEvent so that
when iterating resp.Events, if event.Kv != nil and len(event.Kv.Value) == 0 you
set msg.All = true and immediately return msg (or break out and ensure
msg.All=true is honored) instead of relying on isNewVersionEvents across the
whole batch; update the loop around event.Kv/val handling (variables: resp
clientv3.WatchResponse, msg PrivilegeEvent, isNewVersionEvents, tmp
PrivilegeEvent) so a single legacy empty event triggers a full reload for other
nodes.
- Around line 3015-3017: The code reads do.serverID directly while
serverIDKeeper writes it with atomic.StoreUint64, causing a
race/misclassification; replace the direct read with an atomic load (use
atomic.LoadUint64 on the same serverID field) when assigning event.ServerID so
the access is consistent with serverIDKeeper's atomic.StoreUint64 usage.
- Around line 3021-3024: The current code tries to PutKVToEtcd with a payload
that may exceed size.MB, risking etcd failure and leaking full username lists to
logs; change the branch where uint64(len(data)) > size.MB to instead replace
data with a small "full-refresh" notification (e.g., a marshaled struct with
All:true) and log a warning without including the raw byte payload; then call
ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data)) with
the reduced payload. Ensure you reference the same symbols (size.MB,
privilegeKey, do.ctx, do.etcdClient, ddlutil.PutKVToEtcd) and do not emit the
large data in logs.
In `@pkg/executor/infoschema_reader_test.go`:
- Around line 714-715: The test currently asserts an absolute count (87) from
the information_schema.tidb_index_usage table which is brittle; change the
assertion to compute a baseline count from tk.MustQuery("select count(*) from
information_schema.tidb_index_usage;") before the index-related operations and
then assert baseline+6 instead of the hard-coded 87. Locate the tk.MustQuery
call in the test, capture the returned count into a variable, convert to an
integer, and use that variable+6 in the final Check call (keep the query text
"select count(*) from information_schema.tidb_index_usage;" and the
testkit.Check call usage intact).
In `@pkg/privilege/privileges/cache.go`:
- Around line 502-528: findUserAndAllRoles only traverses roleGraph and thus
misses grantees affected by changes to mysql.default_roles; update this function
to also include reverse default_roles dependents by accepting or accessing the
reverse-default-roles mapping (e.g. a map from role -> []grantee) and enqueueing
those grantees just like roleGraph edges so they enter the returned set, or if
you cannot access that map here, make callers that handle default_roles changes
call UpdateAll() instead; reference findUserAndAllRoles, roleGraph,
default_roles/defaultRoles and UpdateAll when implementing the change.
In `@pkg/privilege/privileges/privileges_test.go`:
- Around line 2169-2202: The test TestSQLVariableAccelerateUserCreationUpdate
mutates the global flag tidb_accelerate_user_creation_update without guaranteed
restoration on failure; capture the current state at the start (e.g. via
variable.AccelerateUserCreationUpdate.Load() or querying
@@global.tidb_accelerate_user_creation_update) and register a t.Cleanup that
restores the original value (either by calling tk.MustExec("set
@@global.tidb_accelerate_user_creation_update = ...") or by using
variable.AccelerateUserCreationUpdate.Store(...)) so the flag is returned to its
prior state even if the test fails; keep the rest of the test logic (create
user, set on/off) unchanged and reference the
TestSQLVariableAccelerateUserCreationUpdate and
variable.AccelerateUserCreationUpdate symbols when making the change.
In `@pkg/server/tests/commontest/tidb_test.go`:
- Line 2669: The call to qctx.Session.Auth(...) ignores the returned error;
capture it (err := qctx.Session.Auth(...)) and fail the test on error (for
example: if err != nil { t.Fatalf("Auth failed: %v", err) } or use your test
framework's require/assert failure), so the test stops with a clear message if
authentication fails rather than continuing with a misconfigured session.
In `@tests/integrationtest/t/privilege/privileges.test`:
- Around line 903-905: The test setup misses pre-cleaning the role 'aa@bb', so
add a DROP ROLE IF EXISTS 'aa@bb' before CREATE ROLE 'aa@bb' in the
privileges.test setup to make the test self-contained and deterministic; locate
the SQL block in tests/integrationtest/t/privilege/privileges.test around the
lines that run "drop user if exists u1; create user u1; create role 'aa@bb';"
and insert the matching "DROP ROLE IF EXISTS 'aa@bb';" immediately before the
CREATE ROLE statement.
---
Outside diff comments:
In `@br/pkg/restore/snap_client/systable_restore.go`:
- Around line 383-407: The current afterSystemTablesReplaced method only calls
rc.dom.NotifyUpdateAllUsersPrivilege() when the restored table name equals the
literal "user", which misses other privilege-related tables (db, tables_priv,
columns_priv, default_roles, role_edges, global_priv, global_grants) and leaves
in-memory grants stale; update the condition to check whether any restored table
is in the privilege table set (use sysPrivilegeTableMap or a helper like
notifyUpdateAllUsersPrivilege) and call rc.dom.NotifyUpdateAllUsersPrivilege()
when any privilege table is present (preserving existing logging and error
aggregation behavior in afterSystemTablesReplaced) so all privilege restores
trigger a refresh.
In `@pkg/executor/show.go`:
- Around line 1876-1901: The authData pulled into the DDL isn't escaped and can
break the generated SQL; before building authStr in show.go (where authData,
authplugin and showStr are composed), properly escape authentication_string
characters (at minimum single quote and backslash, and any other
SQL-string-special bytes used by your SQL dialect) or use an existing
SQL/string-quoting helper to produce a safely quoted literal, then embed that
escaped/quoted value into authStr (preserving the existing skip for
mysql.AuthSocket && authData == ""); update the construction that uses
fmt.Sprintf for showStr to use the escaped/quoted authData instead of the raw
authData.
In `@pkg/executor/test/passwordtest/password_management_test.go`:
- Around line 919-928: changeAutoLockedLastChanged currently swallows any error
from the privilege refresh; modify changeAutoLockedLastChanged to return an
error and propagate any failure from
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() (i.e., change
signature from func changeAutoLockedLastChanged(...) to func
changeAutoLockedLastChanged(...) error and return the error if
NotifyUpdateAllUsersPrivilege() fails after tk.MustExec), and then update every
call site to assert the result (e.g., require.NoError(t,
changeAutoLockedLastChanged(...))). Ensure function still performs the SQL
update via tk.MustExec and only returns nil on successful
NotifyUpdateAllUsersPrivilege().
---
Nitpick comments:
In `@pkg/extension/auth_test.go`:
- Line 238: The test currently asserts total calls with p.AssertNumberOfCalls(t,
"ValidateAuthString", 3) which couples this assertion to prior side effects;
instead, either reset or clear the mock before the login section (so subsequent
calls can be asserted deterministically) or replace the global count assertion
with a delta-style check that only verifies the number of ValidateAuthString
calls made during the login step (e.g., capture the current call count from the
mock, perform the login actions, then assert the increment equals the expected
delta). Target the mock method name ValidateAuthString and the assertion call
p.AssertNumberOfCalls to implement the change.
In `@pkg/session/bootstrap.go`:
- Around line 388-389: The DDL defines a redundant secondary index KEY i_user
(USER) on mysql.global_grants which duplicates the PRIMARY KEY (USER,HOST,PRIV);
remove the secondary index declaration to avoid extra DDL/storage/write
overhead. Edit the CREATE/ALTER statement(s) in pkg/session/bootstrap.go that
reference mysql.global_grants and delete the "KEY i_user (USER)" entry (and any
identical occurrences elsewhere in that file, e.g., the other repeated
definition), leaving only the PRIMARY KEY (USER,HOST,PRIV).
In `@pkg/sessionctx/variable/sysvar.go`:
- Around line 3557-3562: Add a GetGlobal to mirror the existing SetGlobal for
the TiDBAccelerateUserCreationUpdate variable so reads reflect the in-memory
atomic state: implement a GetGlobal function for the variable that calls
AccelerateUserCreationUpdate.Load(), converts the boolean to the string form
with BoolToOnOff (matching the existing Value/SetGlobal pattern), and returns
that string (with nil error) so GetGlobal and SetGlobal are consistent with
other atomic-backed globals like
TiDBEnableBatchDML/TiDBEnableCheckConstraint/TiDBEnableResourceControl.
In `@pkg/sessionctx/variable/tidb_vars.go`:
- Around line 1033-1034: Update the comment for the
TiDBAccelerateUserCreationUpdate constant to explain the trade-off: when enabled
(TiDBAccelerateUserCreationUpdate) the server eagerly loads and updates whole
user data in-memory to speed up privilege/user refreshes, at the cost of
increased memory footprint for the user cache; clarify that the default is off
to avoid higher memory usage in typical deployments and note any relevant
invariants (e.g., affects in-memory user-cache size and refresh latency).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 5e46b9ec-a05c-4917-afd1-7d6690571a73
📒 Files selected for processing (28)
br/pkg/restore/snap_client/systable_restore.gobr/pkg/restore/snap_client/systable_restore_test.gopkg/domain/BUILD.bazelpkg/domain/domain.gopkg/executor/grant.gopkg/executor/infoschema_reader_test.gopkg/executor/revoke.gopkg/executor/show.gopkg/executor/simple.gopkg/executor/test/passwordtest/password_management_test.gopkg/extension/auth_test.gopkg/privilege/privilege.gopkg/privilege/privileges/BUILD.bazelpkg/privilege/privileges/cache.gopkg/privilege/privileges/cache_test.gopkg/privilege/privileges/privileges.gopkg/privilege/privileges/privileges_test.gopkg/privilege/privileges/tidb_auth_token_test.gopkg/server/tests/commontest/tidb_test.gopkg/session/bootstrap.gopkg/session/bootstraptest/bootstrap_upgrade_test.gopkg/session/session.gopkg/sessionctx/variable/sysvar.gopkg/sessionctx/variable/tidb_vars.gotests/integrationtest/r/explain.resulttests/integrationtest/r/privilege/privileges.resulttests/integrationtest/t/explain.testtests/integrationtest/t/privilege/privileges.test
| if do.etcdClient != nil { | ||
| event := PrivilegeEvent{ServerID: do.serverID} | ||
| event.ServerID = do.serverID | ||
| data, err := json.Marshal(event) |
There was a problem hiding this comment.
Read serverID through the atomic accessor.
serverIDKeeper writes this field with atomic.StoreUint64, so the direct read here mixes atomic and non-atomic access and can misclassify self-emitted events.
🩹 Minimal fix
- event.ServerID = do.serverID
+ event.ServerID = do.ServerID()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if do.etcdClient != nil { | |
| event := PrivilegeEvent{ServerID: do.serverID} | |
| event.ServerID = do.serverID | |
| data, err := json.Marshal(event) | |
| if do.etcdClient != nil { | |
| event.ServerID = do.ServerID() | |
| data, err := json.Marshal(event) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/domain/domain.go` around lines 3015 - 3017, The code reads do.serverID
directly while serverIDKeeper writes it with atomic.StoreUint64, causing a
race/misclassification; replace the direct read with an atomic load (use
atomic.LoadUint64 on the same serverID field) when assigning event.ServerID so
the access is consistent with serverIDKeeper's atomic.StoreUint64 usage.
| if uint64(len(data)) > size.MB { | ||
| logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data)) | ||
| } | ||
| err = ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data)) |
There was a problem hiding this comment.
Downgrade oversized partial notifications to a full-refresh event.
If this payload is already above the warning threshold, PutKVToEtcd can fail and only the local TiDB refreshes. Falling back to {All:true} keeps cross-node propagation reliable and avoids dumping the full username list into logs.
🩹 Safer fallback
- if uint64(len(data)) > size.MB {
- logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data))
- }
+ if uint64(len(data)) > size.MB {
+ origBytes := len(data)
+ userCount := len(event.UserList)
+ event = PrivilegeEvent{All: true, ServerID: event.ServerID}
+ data, err = json.Marshal(event)
+ if err != nil {
+ return errors.Trace(err)
+ }
+ logutil.BgLogger().Warn("notify update privilege message too large, falling back to full reload",
+ zap.Int("user_count", userCount),
+ zap.Int("bytes", origBytes))
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if uint64(len(data)) > size.MB { | |
| logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data)) | |
| } | |
| err = ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data)) | |
| if uint64(len(data)) > size.MB { | |
| origBytes := len(data) | |
| userCount := len(event.UserList) | |
| event = PrivilegeEvent{All: true, ServerID: event.ServerID} | |
| data, err = json.Marshal(event) | |
| if err != nil { | |
| return errors.Trace(err) | |
| } | |
| logutil.BgLogger().Warn("notify update privilege message too large, falling back to full reload", | |
| zap.Int("user_count", userCount), | |
| zap.Int("bytes", origBytes)) | |
| } | |
| err = ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/domain/domain.go` around lines 3021 - 3024, The current code tries to
PutKVToEtcd with a payload that may exceed size.MB, risking etcd failure and
leaking full username lists to logs; change the branch where uint64(len(data)) >
size.MB to instead replace data with a small "full-refresh" notification (e.g.,
a marshaled struct with All:true) and log a warning without including the raw
byte payload; then call ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5,
privilegeKey, string(data)) with the reduced payload. Ensure you reference the
same symbols (size.MB, privilegeKey, do.ctx, do.etcdClient, ddlutil.PutKVToEtcd)
and do not emit the large data in logs.
| tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`).Check( | ||
| testkit.RowsWithSep("|", "81")) | ||
| testkit.RowsWithSep("|", "87")) |
There was a problem hiding this comment.
Avoid hard-coding the global tidb_index_usage row count.
This 87 will keep changing whenever unrelated system-table indexes are added, which is exactly what happened in this PR. Since the test already proves the six idt1/idt2 rows explicitly, a more stable assertion here is “baseline count + 6” instead of the absolute global total.
Suggested change
func TestIndexUsageTable(t *testing.T) {
store := testkit.CreateMockStore(t)
tk := testkit.NewTestKit(t, store)
tk.MustExec("use test")
+ baseCnt, err := strconv.Atoi(
+ tk.MustQuery(`select count(*) from information_schema.tidb_index_usage`).Rows()[0][0].(string),
+ )
+ require.NoError(t, err)
tk.MustExec("create table idt1(col_1 int primary key, col_2 int, index idx_1(col_1), index idx_2(col_2), index idx_3(col_1, col_2));")
tk.MustExec("create table idt2(col_1 int primary key, col_2 int, index idx_1(col_1), index idx_2(col_2), index idx_4(col_2, col_1));")
@@
- tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`).Check(
- testkit.RowsWithSep("|", "87"))
+ cnt, err := strconv.Atoi(
+ tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`).Rows()[0][0].(string),
+ )
+ require.NoError(t, err)
+ require.Equal(t, baseCnt+6, cnt)As per coding guidelines, "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/executor/infoschema_reader_test.go` around lines 714 - 715, The test
currently asserts an absolute count (87) from the
information_schema.tidb_index_usage table which is brittle; change the assertion
to compute a baseline count from tk.MustQuery("select count(*) from
information_schema.tidb_index_usage;") before the index-related operations and
then assert baseline+6 instead of the hard-coded 87. Locate the tk.MustQuery
call in the test, capture the returned count into a variable, convert to an
integer, and use that variable+6 in the final Check call (keep the query text
"select count(*) from information_schema.tidb_index_usage;" and the
testkit.Check call usage intact).
| func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) { | ||
| store := createStoreAndPrepareDB(t) | ||
| tk := testkit.NewTestKit(t, store) | ||
| dom := domain.GetDomain(tk.Session()) | ||
| // 1. check the default variable value | ||
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0")) | ||
| // trigger priv reload | ||
| tk.MustExec("create user aaa") | ||
| handle := dom.PrivilegeHandle() | ||
| priv := handle.Get() | ||
| require.False(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | ||
|
|
||
| // 2. change the variable and check | ||
| tk.MustExec("set @@global.tidb_accelerate_user_creation_update = on") | ||
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("1")) | ||
| require.True(t, variable.AccelerateUserCreationUpdate.Load()) | ||
| tk.MustExec("create user bbb") | ||
| // trigger priv reload, but data for bbb is not really loaded | ||
| tk.MustExec("grant select on test.* to bbb") | ||
| priv = handle.Get() | ||
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | ||
| tk1 := testkit.NewTestKit(t, store) | ||
| // if user bbb login, everything works as expected | ||
| require.NoError(t, tk1.Session().Auth(&auth.UserIdentity{Username: "bbb", Hostname: "localhost"}, nil, nil, nil)) | ||
| priv = handle.Get() | ||
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | ||
|
|
||
| // 3. change the variable and check again | ||
| tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off") | ||
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0")) | ||
| tk.MustExec("drop user aaa") | ||
| priv = handle.Get() | ||
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | ||
| } |
There was a problem hiding this comment.
Restore the global accelerate flag with t.Cleanup.
This test only flips tidb_accelerate_user_creation_update back to off on the happy path. Any earlier require.* failure leaves the process-wide flag changed for later tests.
Suggested fix
func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) {
store := createStoreAndPrepareDB(t)
tk := testkit.NewTestKit(t, store)
+ t.Cleanup(func() {
+ tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off")
+ })
dom := domain.GetDomain(tk.Session())As per coding guidelines, **/{*_test.go,testdata/**}: Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) { | |
| store := createStoreAndPrepareDB(t) | |
| tk := testkit.NewTestKit(t, store) | |
| dom := domain.GetDomain(tk.Session()) | |
| // 1. check the default variable value | |
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0")) | |
| // trigger priv reload | |
| tk.MustExec("create user aaa") | |
| handle := dom.PrivilegeHandle() | |
| priv := handle.Get() | |
| require.False(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| // 2. change the variable and check | |
| tk.MustExec("set @@global.tidb_accelerate_user_creation_update = on") | |
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("1")) | |
| require.True(t, variable.AccelerateUserCreationUpdate.Load()) | |
| tk.MustExec("create user bbb") | |
| // trigger priv reload, but data for bbb is not really loaded | |
| tk.MustExec("grant select on test.* to bbb") | |
| priv = handle.Get() | |
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| tk1 := testkit.NewTestKit(t, store) | |
| // if user bbb login, everything works as expected | |
| require.NoError(t, tk1.Session().Auth(&auth.UserIdentity{Username: "bbb", Hostname: "localhost"}, nil, nil, nil)) | |
| priv = handle.Get() | |
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| // 3. change the variable and check again | |
| tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off") | |
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0")) | |
| tk.MustExec("drop user aaa") | |
| priv = handle.Get() | |
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| } | |
| func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) { | |
| store := createStoreAndPrepareDB(t) | |
| tk := testkit.NewTestKit(t, store) | |
| t.Cleanup(func() { | |
| tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off") | |
| }) | |
| dom := domain.GetDomain(tk.Session()) | |
| // 1. check the default variable value | |
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0")) | |
| // trigger priv reload | |
| tk.MustExec("create user aaa") | |
| handle := dom.PrivilegeHandle() | |
| priv := handle.Get() | |
| require.False(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| // 2. change the variable and check | |
| tk.MustExec("set @@global.tidb_accelerate_user_creation_update = on") | |
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("1")) | |
| require.True(t, variable.AccelerateUserCreationUpdate.Load()) | |
| tk.MustExec("create user bbb") | |
| // trigger priv reload, but data for bbb is not really loaded | |
| tk.MustExec("grant select on test.* to bbb") | |
| priv = handle.Get() | |
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| tk1 := testkit.NewTestKit(t, store) | |
| // if user bbb login, everything works as expected | |
| require.NoError(t, tk1.Session().Auth(&auth.UserIdentity{Username: "bbb", Hostname: "localhost"}, nil, nil, nil)) | |
| priv = handle.Get() | |
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| // 3. change the variable and check again | |
| tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off") | |
| tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0")) | |
| tk.MustExec("drop user aaa") | |
| priv = handle.Get() | |
| require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv)) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/privilege/privileges/privileges_test.go` around lines 2169 - 2202, The
test TestSQLVariableAccelerateUserCreationUpdate mutates the global flag
tidb_accelerate_user_creation_update without guaranteed restoration on failure;
capture the current state at the start (e.g. via
variable.AccelerateUserCreationUpdate.Load() or querying
@@global.tidb_accelerate_user_creation_update) and register a t.Cleanup that
restores the original value (either by calling tk.MustExec("set
@@global.tidb_accelerate_user_creation_update = ...") or by using
variable.AccelerateUserCreationUpdate.Store(...)) so the flag is returned to its
prior state even if the test fails; keep the rest of the test logic (create
user, set on/off) unchanged and reference the
TestSQLVariableAccelerateUserCreationUpdate and
variable.AccelerateUserCreationUpdate symbols when making the change.
| _, err = Execute(context.Background(), qctx, "create user testuser;") | ||
| require.NoError(t, err) | ||
| qctx.Session.GetSessionVars().User = &auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"} | ||
| qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil) |
There was a problem hiding this comment.
Missing error check for Auth() call.
The Auth() method returns an error that is being silently discarded. If authentication fails, the test will continue with an incorrectly configured session, causing confusing failures in subsequent assertions. As per coding guidelines, avoid silently swallowing errors in Go code.
🔧 Proposed fix to check the error
- qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil)
+ err = qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil)
+ require.NoError(t, err)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil) | |
| err = qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil) | |
| require.NoError(t, err) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/server/tests/commontest/tidb_test.go` at line 2669, The call to
qctx.Session.Auth(...) ignores the returned error; capture it (err :=
qctx.Session.Auth(...)) and fail the test on error (for example: if err != nil {
t.Fatalf("Auth failed: %v", err) } or use your test framework's require/assert
failure), so the test stops with a clear message if authentication fails rather
than continuing with a misconfigured session.
| drop user if exists u1; | ||
| create user u1; | ||
| create role 'aa@bb'; |
There was a problem hiding this comment.
Pre-clean the role before creating it.
u1 is cleaned up up front, but 'aa@bb' is not. If a prior run aborts before teardown, CREATE ROLE 'aa@bb' will fail on rerun. Add a matching DROP ROLE IF EXISTS here to keep the case self-contained.
♻️ Suggested setup hardening
# TestIssue59552
drop user if exists u1;
+drop role if exists 'aa@bb';
create user u1;
create role 'aa@bb';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| drop user if exists u1; | |
| create user u1; | |
| create role 'aa@bb'; | |
| drop user if exists u1; | |
| drop role if exists 'aa@bb'; | |
| create user u1; | |
| create role 'aa@bb'; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/integrationtest/t/privilege/privileges.test` around lines 903 - 905,
The test setup misses pre-cleaning the role 'aa@bb', so add a DROP ROLE IF
EXISTS 'aa@bb' before CREATE ROLE 'aa@bb' in the privileges.test setup to make
the test self-contained and deterministic; locate the SQL block in
tests/integrationtest/t/privilege/privileges.test around the lines that run
"drop user if exists u1; create user u1; create role 'aa@bb';" and insert the
matching "DROP ROLE IF EXISTS 'aa@bb';" immediately before the CREATE ROLE
statement.
60c9344 to
a0647dd
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/executor/test/passwordtest/password_management_test.go (1)
919-928:⚠️ Potential issue | 🟡 MinorDon't ignore the privilege refresh failure in this helper.
If
NotifyUpdateAllUsersPrivilege()fails here, the test keeps running against stale in-memory privileges and can become flaky. Return the error or pass*testing.Tinto the helper and assert it.As per coding guidelines "Keep error handling actionable and contextual; avoid silently swallowing errors in Go code".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/test/passwordtest/password_management_test.go` around lines 919 - 928, The helper changeAutoLockedLastChanged currently calls domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() and ignores its error; modify the helper to surface and handle that failure—either change changeAutoLockedLastChanged to return error and propagate it to the caller (check and fail the test), or accept a *testing.T parameter and assert the call succeeds (e.g., require.NoError/ t.Fatalf on failure). Ensure the updated function signature and all callers are updated accordingly so NotifyUpdateAllUsersPrivilege() errors are not silently swallowed.
♻️ Duplicate comments (4)
pkg/domain/domain.go (3)
3015-3017:⚠️ Potential issue | 🟠 MajorRead
serverIDthrough the atomic accessor.
serverIDis written with atomic stores elsewhere, so this direct read can race and misclassify a self-emitted privilege notification as remote.🩹 Minimal fix
- event.ServerID = do.serverID + event.ServerID = do.ServerID()Run this to confirm the mixed atomic/non-atomic access pattern:
#!/bin/bash rg -n -C2 'event\.ServerID = do\.serverID|atomic\.StoreUint64\(&do\.serverID|func \(do \*Domain\) ServerID\(' pkg/domain/domain.go🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/domain.go` around lines 3015 - 3017, The code sets event.ServerID by directly reading do.serverID which races with atomic stores; replace the direct field access with the existing atomic accessor (call the Domain.ServerID() method or the atomic load wrapper) so event.ServerID = do.ServerID() (or the accessor name in this file) to ensure a safe atomic read; update the assignment in the block that checks do.etcdClient != nil to use that accessor instead of do.serverID.
1869-1903:⚠️ Potential issue | 🟠 MajorTreat any empty-value watch event as a full reload.
A single legacy empty-value event can arrive in the same
WatchResponseas new-format JSON events during a rolling upgrade. This implementation only falls back toAll=truewhen none of the events carried JSON, so that legacy change is silently dropped and other TiDB nodes can keep stale privilege state.🩹 Minimal fix
func (do *Domain) decodePrivilegeEvent(resp clientv3.WatchResponse) PrivilegeEvent { var msg PrivilegeEvent isNewVersionEvents := false for _, event := range resp.Events { if event.Kv != nil { val := event.Kv.Value - if len(val) > 0 { - var tmp PrivilegeEvent - err := json.Unmarshal(val, &tmp) - if err != nil { - logutil.BgLogger().Warn("decodePrivilegeEvent unmarshal fail", zap.Error(err)) - break - } - isNewVersionEvents = true - if do.ServerID() != 0 && tmp.ServerID == do.ServerID() { - // Skip the events from this TiDB-Server - continue - } - if tmp.All { - msg.All = true - break - } - // duplicated users in list is ok. - msg.UserList = append(msg.UserList, tmp.UserList...) - } + if len(val) == 0 { + msg.All = true + break + } + var tmp PrivilegeEvent + err := json.Unmarshal(val, &tmp) + if err != nil { + logutil.BgLogger().Warn("decodePrivilegeEvent unmarshal fail", zap.Error(err)) + break + } + isNewVersionEvents = true + if do.ServerID() != 0 && tmp.ServerID == do.ServerID() { + // Skip the events from this TiDB-Server + continue + } + if tmp.All { + msg.All = true + break + } + // duplicated users in list is ok. + msg.UserList = append(msg.UserList, tmp.UserList...) } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/domain.go` around lines 1869 - 1903, In decodePrivilegeEvent, treat any watch event with an empty event.Kv.Value as a legacy "full reload" signal instead of waiting to see if all events lack JSON: inside the loop over resp.Events in the Domain.decodePrivilegeEvent function, if event.Kv != nil and len(event.Kv.Value) == 0 then set msg.All = true and return msg immediately (so legacy empty-value events are not dropped when mixed with new JSON events); remove reliance on isNewVersionEvents for this case so an empty-value event always forces a full reload.
3021-3024:⚠️ Potential issue | 🟠 MajorDowngrade oversized partial notifications to a full refresh event.
This branch only warns and still writes the oversized JSON. If etcd rejects it, only the local TiDB refreshes; logging the raw payload also dumps the full username list into logs.
🩹 Safer fallback
if uint64(len(data)) > size.MB { - logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data)) + origBytes := len(data) + userCount := len(event.UserList) + event = PrivilegeEvent{All: true, ServerID: event.ServerID} + data, err = json.Marshal(event) + if err != nil { + return errors.Trace(err) + } + logutil.BgLogger().Warn("notify update privilege message too large, falling back to full reload", + zap.Int("user_count", userCount), + zap.Int("bytes", origBytes)) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/domain.go` around lines 3021 - 3024, The code currently warns and logs the raw oversized payload then attempts to write it to etcd; instead, detect when uint64(len(data)) > size.MB and do not log the raw ByteString or call ddlutil.PutKVToEtcd with the oversized JSON. Replace that branch to (1) avoid dumping data to logs via logutil.BgLogger(), (2) write a safe fallback value/key to etcd indicating a full refresh (e.g., a compact "full-refresh" marker or an empty payload) using ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, "<FULL_REFRESH>"), and (3) ensure the local code path still triggers a full refresh locally. Update references in this block: size.MB, logutil.BgLogger(), ddlutil.PutKVToEtcd, privilegeKey, do.ctx, do.etcdClient.pkg/privilege/privileges/cache.go (1)
502-528:⚠️ Potential issue | 🟠 MajorStill missing reverse
default_rolesdependents.
findUserAndAllRolesonly expands throughroleGraph. If a role is dropped or renamed, the affectedmysql.default_rolesrows belong to the grantees of that role, but those grantee usernames never enter this set, so their cacheddefaultRolesstay stale until the next full reload. Please either include reversedefault_rolesdependents here or force those callers downUpdateAll().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/privilege/privileges/cache.go` around lines 502 - 528, findUserAndAllRoles currently only traverses roleGraph so usernames that depend on a role via mysql.default_roles never get added, leaving defaultRoles cache stale; update findUserAndAllRoles to also consider reverse default_roles dependents (or accept an extra map param like reverseDefaultRoles mapping role -> []grantee and BFS those grantees similarly), ensuring you add grantee usernames into the all set and queue when their role appears, or alternatively document and change callers (who expect defaultRoles updates) to call UpdateAll() instead of this function; refer to findUserAndAllRoles, roleGraph, defaultRoles, UpdateAll, and mysql.default_roles when making the change.
🧹 Nitpick comments (2)
pkg/sessionctx/variable/tidb_vars.go (1)
1032-1034: Consider clarifying the comment to explain the performance trade-off.The current comment "decides whether tidb will load & update the whole user's data in-memory" is ambiguous about what enabling this variable actually does. Based on the PR context, when enabled, TiDB uses partial updates for the privilege cache instead of full reloads during user/privilege modifications, which reduces CPU and memory usage.
📝 Suggested documentation improvement
- // TiDBAccelerateUserCreationUpdate decides whether tidb will load & update the whole user's data in-memory. + // TiDBAccelerateUserCreationUpdate enables partial updates for the privilege cache. + // When enabled, user/privilege modifications trigger targeted cache updates instead of full reloads, + // significantly reducing CPU and memory usage during frequent grant/revoke operations. TiDBAccelerateUserCreationUpdate = "tidb_accelerate_user_creation_update"As per coding guidelines: "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sessionctx/variable/tidb_vars.go` around lines 1032 - 1034, Update the comment for the constant TiDBAccelerateUserCreationUpdate to clearly state its behavior and performance trade-offs: explain that when enabled TiDB performs partial updates to the privilege/user cache on user/privilege modifications (avoiding full in-memory reloads), reducing CPU and memory use; when disabled it falls back to full reloads of user data which can be simpler but more costly. Mention the intended effect (less CPU/memory, faster incremental updates) and any important compatibility/invariance (affects privilege cache update strategy).pkg/privilege/privileges/cache.go (1)
996-1088: Apply the sameuserListfilter here as the other decoders.When
addUserFilterConditionskips the SQL predicate (len(userList)==0or>1024), this closure still materializes everymysql.userrow intodiff.user; merge later discards most of it. That makes large-batch or accidental-empty partial refreshes much closer to a full user-table load than intended.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/privilege/privileges/cache.go` around lines 996 - 1088, The decoder decodeUserTableRow must respect the incoming userList filter: after populating value (or at least after calling value.assignUserOrHost) add a membership check like if len(userList) > 0 { if _, ok := userList[value.User]; !ok { return nil } } so rows for users not in userList are skipped and not appended into p.user; update the closure in decodeUserTableRow to perform this early return before computing/merging old := p.user.Get(...) and p.user.ReplaceOrInsert(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/executor/simple.go`:
- Around line 2424-2425: When s.IsDropRole is true you must capture the grantee
usernames before role-edge/default-role rows are deleted and pass them into the
NotifyUpdatePrivilege call; modify the flow around
userIdentityToUserList(s.UserList) so that when s.IsDropRole you gather the
grantees (e.g., from the data structure that will be used to remove role
edges/default roles) into a user list and merge/append those usernames with
userIdentityToUserList(s.UserList) before calling
domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(...); ensure the code references
s.IsDropRole, userIdentityToUserList, and NotifyUpdatePrivilege so the affected
grantees are notified prior to deleting mysql.role_edges/mysql.default_roles.
- Around line 2222-2227: The current notification only appends users from
s.UserToUsers (users.OldUser.Username and users.NewUser.Username) before calling
domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(userList); to fix, also collect
grantee users and any default-role/TO_USER owners affected by role renames and
append their usernames to the same userList; locate the rename handling around
s.UserToUsers and, for each renamed role entry, query the grant/default-role
owners (e.g., look up TO_USER owners or role grant records related to users who
had the old role) and ensure their usernames are added to userList prior to
calling NotifyUpdatePrivilege so their privilege cache is refreshed.
In `@pkg/privilege/privileges/cache.go`:
- Around line 531-567: loadSomeUsers currently calls loadTable directly for many
mysql.* tables and returns ErrNoSuchTable as fatal; change it to tolerate
missing privilege tables the same way UpdateAll does by either calling the
existing helper loaders (e.g.,
LoadDBTable/LoadTablePrivTable/LoadDefaultRoles/LoadColumnsPriv or whatever
"Load*" helpers exist) or by checking for errors.Is(err, ErrNoSuchTable) after
each loadTable call and ignoring that error for the specific SQL identifiers
sqlLoadDBTable, sqlLoadTablePrivTable, sqlLoadDefaultRoles,
sqlLoadColumnsPrivTable (and any role edges table used elsewhere); keep error
handling for other errors and for tables that must exist (sqlLoadUserTable,
sqlLoadGlobalPrivTable, sqlLoadGlobalGrantsTable), and use the same decode
functions (p.decodeDBTableRow, p.decodeTablesPrivTableRow,
p.decodeDefaultRoleTableRow, p.decodeColumnsPrivTableRow) so targeted refresh
via loadSomeUsers mirrors UpdateAll's non-fatal handling of missing privilege
tables.
---
Outside diff comments:
In `@pkg/executor/test/passwordtest/password_management_test.go`:
- Around line 919-928: The helper changeAutoLockedLastChanged currently calls
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() and ignores its
error; modify the helper to surface and handle that failure—either change
changeAutoLockedLastChanged to return error and propagate it to the caller
(check and fail the test), or accept a *testing.T parameter and assert the call
succeeds (e.g., require.NoError/ t.Fatalf on failure). Ensure the updated
function signature and all callers are updated accordingly so
NotifyUpdateAllUsersPrivilege() errors are not silently swallowed.
---
Duplicate comments:
In `@pkg/domain/domain.go`:
- Around line 3015-3017: The code sets event.ServerID by directly reading
do.serverID which races with atomic stores; replace the direct field access with
the existing atomic accessor (call the Domain.ServerID() method or the atomic
load wrapper) so event.ServerID = do.ServerID() (or the accessor name in this
file) to ensure a safe atomic read; update the assignment in the block that
checks do.etcdClient != nil to use that accessor instead of do.serverID.
- Around line 1869-1903: In decodePrivilegeEvent, treat any watch event with an
empty event.Kv.Value as a legacy "full reload" signal instead of waiting to see
if all events lack JSON: inside the loop over resp.Events in the
Domain.decodePrivilegeEvent function, if event.Kv != nil and len(event.Kv.Value)
== 0 then set msg.All = true and return msg immediately (so legacy empty-value
events are not dropped when mixed with new JSON events); remove reliance on
isNewVersionEvents for this case so an empty-value event always forces a full
reload.
- Around line 3021-3024: The code currently warns and logs the raw oversized
payload then attempts to write it to etcd; instead, detect when
uint64(len(data)) > size.MB and do not log the raw ByteString or call
ddlutil.PutKVToEtcd with the oversized JSON. Replace that branch to (1) avoid
dumping data to logs via logutil.BgLogger(), (2) write a safe fallback value/key
to etcd indicating a full refresh (e.g., a compact "full-refresh" marker or an
empty payload) using ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey,
"<FULL_REFRESH>"), and (3) ensure the local code path still triggers a full
refresh locally. Update references in this block: size.MB, logutil.BgLogger(),
ddlutil.PutKVToEtcd, privilegeKey, do.ctx, do.etcdClient.
In `@pkg/privilege/privileges/cache.go`:
- Around line 502-528: findUserAndAllRoles currently only traverses roleGraph so
usernames that depend on a role via mysql.default_roles never get added, leaving
defaultRoles cache stale; update findUserAndAllRoles to also consider reverse
default_roles dependents (or accept an extra map param like reverseDefaultRoles
mapping role -> []grantee and BFS those grantees similarly), ensuring you add
grantee usernames into the all set and queue when their role appears, or
alternatively document and change callers (who expect defaultRoles updates) to
call UpdateAll() instead of this function; refer to findUserAndAllRoles,
roleGraph, defaultRoles, UpdateAll, and mysql.default_roles when making the
change.
---
Nitpick comments:
In `@pkg/privilege/privileges/cache.go`:
- Around line 996-1088: The decoder decodeUserTableRow must respect the incoming
userList filter: after populating value (or at least after calling
value.assignUserOrHost) add a membership check like if len(userList) > 0 { if _,
ok := userList[value.User]; !ok { return nil } } so rows for users not in
userList are skipped and not appended into p.user; update the closure in
decodeUserTableRow to perform this early return before computing/merging old :=
p.user.Get(...) and p.user.ReplaceOrInsert(...).
In `@pkg/sessionctx/variable/tidb_vars.go`:
- Around line 1032-1034: Update the comment for the constant
TiDBAccelerateUserCreationUpdate to clearly state its behavior and performance
trade-offs: explain that when enabled TiDB performs partial updates to the
privilege/user cache on user/privilege modifications (avoiding full in-memory
reloads), reducing CPU and memory use; when disabled it falls back to full
reloads of user data which can be simpler but more costly. Mention the intended
effect (less CPU/memory, faster incremental updates) and any important
compatibility/invariance (affects privilege cache update strategy).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 487e0889-3d45-4bf9-a3be-66b82cb2bd8d
📒 Files selected for processing (29)
br/pkg/restore/snap_client/pipeline_items.gobr/pkg/restore/snap_client/systable_restore.gobr/pkg/restore/snap_client/systable_restore_test.gopkg/domain/BUILD.bazelpkg/domain/domain.gopkg/executor/grant.gopkg/executor/infoschema_reader_test.gopkg/executor/revoke.gopkg/executor/show.gopkg/executor/simple.gopkg/executor/test/passwordtest/password_management_test.gopkg/extension/auth_test.gopkg/privilege/privilege.gopkg/privilege/privileges/BUILD.bazelpkg/privilege/privileges/cache.gopkg/privilege/privileges/cache_test.gopkg/privilege/privileges/privileges.gopkg/privilege/privileges/privileges_test.gopkg/privilege/privileges/tidb_auth_token_test.gopkg/server/tests/commontest/tidb_test.gopkg/session/bootstrap.gopkg/session/bootstraptest/bootstrap_upgrade_test.gopkg/session/session.gopkg/sessionctx/variable/sysvar.gopkg/sessionctx/variable/tidb_vars.gotests/integrationtest/r/explain.resulttests/integrationtest/r/privilege/privileges.resulttests/integrationtest/t/explain.testtests/integrationtest/t/privilege/privileges.test
✅ Files skipped from review due to trivial changes (1)
- tests/integrationtest/t/explain.test
🚧 Files skipped from review as they are similar to previous changes (17)
- tests/integrationtest/r/privilege/privileges.result
- pkg/privilege/privileges/privileges_test.go
- pkg/server/tests/commontest/tidb_test.go
- pkg/extension/auth_test.go
- pkg/executor/show.go
- pkg/executor/grant.go
- pkg/sessionctx/variable/sysvar.go
- pkg/domain/BUILD.bazel
- pkg/privilege/privileges/BUILD.bazel
- pkg/executor/revoke.go
- pkg/executor/infoschema_reader_test.go
- tests/integrationtest/r/explain.result
- pkg/privilege/privilege.go
- pkg/session/session.go
- br/pkg/restore/snap_client/systable_restore.go
- tests/integrationtest/t/privilege/privileges.test
- pkg/session/bootstraptest/bootstrap_upgrade_test.go
| func (p *MySQLPrivilege) loadSomeUsers(ctx sqlexec.SQLExecutor, userList map[string]struct{}) error { | ||
| err := loadTable(ctx, addUserFilterCondition(sqlLoadUserTable, userList), p.decodeUserTableRow(userList)) | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
|
|
||
| err = loadTable(ctx, addUserFilterCondition(sqlLoadGlobalPrivTable, userList), p.decodeGlobalPrivTableRow(userList)) | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
|
|
||
| err = loadTable(ctx, addUserFilterCondition(sqlLoadGlobalGrantsTable, userList), p.decodeGlobalGrantsTableRow(userList)) | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
|
|
||
| err = loadTable(ctx, addUserFilterCondition(sqlLoadDBTable, userList), p.decodeDBTableRow(userList)) | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
|
|
||
| err = loadTable(ctx, addUserFilterCondition(sqlLoadTablePrivTable, userList), p.decodeTablesPrivTableRow(userList)) | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
|
|
||
| err = loadTable(ctx, addUserFilterCondition(sqlLoadDefaultRoles, userList), p.decodeDefaultRoleTableRow(userList)) | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
|
|
||
| err = loadTable(ctx, addUserFilterCondition(sqlLoadColumnsPrivTable, userList), p.decodeColumnsPrivTableRow(userList)) | ||
| if err != nil { | ||
| return errors.Trace(err) | ||
| } | ||
|
|
||
| return nil |
There was a problem hiding this comment.
Keep partial refresh tolerant of missing privilege tables.
UpdateAll() explicitly treats missing mysql.db, mysql.tables_priv, mysql.default_roles, mysql.columns_priv, and mysql.role_edges as non-fatal, but updateUsers()/loadSomeUsers() now hard-fail on the same ErrNoSuchTable cases because they call loadTable directly. With tidb_accelerate_user_creation_update enabled, targeted refresh can fail in environments where full refresh still works. Reuse the Load* helpers or mirror the existing noSuchTable branches here.
Also applies to: 2297-2321
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/privilege/privileges/cache.go` around lines 531 - 567, loadSomeUsers
currently calls loadTable directly for many mysql.* tables and returns
ErrNoSuchTable as fatal; change it to tolerate missing privilege tables the same
way UpdateAll does by either calling the existing helper loaders (e.g.,
LoadDBTable/LoadTablePrivTable/LoadDefaultRoles/LoadColumnsPriv or whatever
"Load*" helpers exist) or by checking for errors.Is(err, ErrNoSuchTable) after
each loadTable call and ignoring that error for the specific SQL identifiers
sqlLoadDBTable, sqlLoadTablePrivTable, sqlLoadDefaultRoles,
sqlLoadColumnsPrivTable (and any role edges table used elsewhere); keep error
handling for other errors and for tables that must exist (sqlLoadUserTable,
sqlLoadGlobalPrivTable, sqlLoadGlobalGrantsTable), and use the same decode
functions (p.decodeDBTableRow, p.decodeTablesPrivTableRow,
p.decodeDefaultRoleTableRow, p.decodeColumnsPrivTableRow) so targeted refresh
via loadSomeUsers mirrors UpdateAll's non-fatal handling of missing privilege
tables.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## release-8.5 #66541 +/- ##
================================================
Coverage ? 55.5309%
================================================
Files ? 1819
Lines ? 653279
Branches ? 0
================================================
Hits ? 362772
Misses ? 263571
Partials ? 26936
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
/retest |
1 similar comment
|
/retest |
a0647dd to
c87733c
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
br/pkg/restore/snap_client/systable_restore.go (1)
389-397:⚠️ Potential issue | 🟠 MajorRefresh privileges for every restored privilege table, not just
mysql.user.
mysql.db,tables_priv,columns_priv,default_roles,role_edges,global_priv, andglobal_grantsalso feed the privilege cache. Restoring one of those without a refresh leaves running TiDBs on stale auth data until the periodic reload.💡 Minimal fix
- if table == "user" { + if _, ok := sysPrivilegeTableMap[table]; ok { if serr := rc.dom.NotifyUpdateAllUsersPrivilege(); serr != nil { log.Warn("failed to flush privileges, please manually execute `FLUSH PRIVILEGES`") err = multierr.Append(err, berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to flush privileges")) } else { log.Info("privilege system table restored, please reconnect to make it effective") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@br/pkg/restore/snap_client/systable_restore.go` around lines 389 - 397, The current loop only calls rc.dom.NotifyUpdateAllUsersPrivilege() when table == "user"; update the condition to call NotifyUpdateAllUsersPrivilege() for all privilege-related tables (e.g., "user", "db", "tables_priv", "columns_priv", "default_roles", "role_edges", "global_priv", "global_grants", and any other MySQL privilege tables used in this codebase) so that after restoring any of those tables you run the refresh; locate the loop over tables and replace the single-table check with a membership test against this privilege-table set and reuse rc.dom.NotifyUpdateAllUsersPrivilege() and the existing error handling/logging branches (keeping the log messages but making them apply to any privilege table restore).pkg/executor/test/passwordtest/password_management_test.go (1)
919-927:⚠️ Potential issue | 🟡 MinorDon't ignore the privilege-refresh error in this helper.
If
NotifyUpdateAllUsersPrivilege()fails here, the following auth assertions run against stale cached privileges and the test failure becomes misleading. This helper should surface the error the same way the other changed call sites do.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/test/passwordtest/password_management_test.go` around lines 919 - 927, The helper changeAutoLockedLastChanged currently ignores the return value of domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege(); change it to capture the error (err := domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege()) and surface it the same way other call sites do (fail the test immediately on error — e.g. call the test failure helper used elsewhere such as require.NoError(t, err) or t.Fatalf("NotifyUpdateAllUsersPrivilege failed: %v", err) / equivalent via testkit), so the test stops on a privilege-refresh failure instead of proceeding with stale privileges.
♻️ Duplicate comments (7)
pkg/executor/infoschema_reader_test.go (1)
714-715:⚠️ Potential issue | 🟡 MinorAvoid another hard-coded global row count.
Line 715 still makes this test depend on the current total rows in
information_schema.tidb_index_usage, so unrelated bootstrap/index additions will keep breaking it. Capture the baseline count before creatingidt1/idt2, then assertbaseCnt + 6here instead. As per coding guidelines, "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/infoschema_reader_test.go` around lines 714 - 715, The test currently asserts a hard-coded total from information_schema.tidb_index_usage; instead capture the baseline row count into a variable (e.g. baseCnt) by running tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`) before creating idt1/idt2, then after creating them replace the literal check with an assertion that the count equals baseCnt + 6; update the check at the spot referencing information_schema.tidb_index_usage to use baseCnt + 6 so the test is deterministic and immune to unrelated bootstrap/index additions.pkg/domain/domain.go (3)
1872-1900:⚠️ Potential issue | 🟠 MajorTreat any empty-value watch event as a full refresh.
A rolling upgrade can still deliver a mixed watch batch containing both legacy empty events and new JSON events. This code only falls back to
All=truewhen every event is legacy, so the legacy change is dropped and other TiDB nodes can keep stale privilege state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/domain.go` around lines 1872 - 1900, The loop over resp.Events must treat any empty-value watch event as a full refresh: inside the resp.Events iteration in the code that unmarshals into PrivilegeEvent, detect when event.Kv.Value is empty (len(val)==0) and immediately set msg.All = true and break out of processing (or continue to skip further per-event processing) so that mixed batches with legacy empty events trigger a full reload; keep existing checks for tmp.All and the skip for events from the same server (do.ServerID()) intact.
3021-3024:⚠️ Potential issue | 🟠 MajorFall back to a full-refresh event when the payload is oversized.
Sending a >1 MiB partial-update payload to etcd can fail, leaving only the local TiDB refreshed. Logging the raw payload also dumps the entire username list into logs. Downgrade to
{All:true}and log only counts/sizes before the etcd write.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/domain.go` around lines 3021 - 3024, When the serialized partial-update payload (variable data) exceeds 1 MiB, avoid sending the oversized payload and logging its raw contents; instead log only safe metrics (e.g., count of users and payload size) and replace the etcd write with a compact full-refresh marker (e.g., a JSON with All:true) before calling ddlutil.PutKVToEtcd; update the code around privilegeKey/ddlutil.PutKVToEtcd (using do.ctx and do.etcdClient) to detect uint64(len(data)) > size.MB, log counts/sizes via logutil.BgLogger().Warn without the raw byte payload, set data to the serialized full-refresh message, then call ddlutil.PutKVToEtcd as before.
3015-3016:⚠️ Potential issue | 🟠 MajorRead
serverIDvia the atomic accessor.
serverIDis written with atomic stores elsewhere, so the direct read here mixes atomic and non-atomic access and can misclassify self-emitted events.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/domain/domain.go` around lines 3015 - 3016, The code reads do.serverID directly when assigning event.ServerID inside the do.etcdClient != nil block, mixing non-atomic reads with atomic writes; change the direct access to use the existing atomic accessor for serverID (i.e., replace event.ServerID = do.serverID with a call to the atomic getter for serverID such as the package's atomic.LoadUint64(&do.serverID) or the repository's provided do.GetServerID()/LoadServerID() method) so the read is done atomically and self-emitted events are classified correctly.pkg/executor/simple.go (2)
2424-2425:⚠️ Potential issue | 🔴 CriticalCapture affected grantees before
DROP ROLEdeletes the evidence.In the
s.IsDropRolepath, notifying only the dropped role names is too late. The users who lost that role are the ones whose privileges changed, but theirmysql.role_edges/mysql.default_rolesrows have already been deleted, so partial refresh can no longer discover them. Snapshot those grantee/default-role owner usernames before deletion and include them inNotifyUpdatePrivilege(...).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/simple.go` around lines 2424 - 2425, In the s.IsDropRole branch you must snapshot the grantee/default-role owner usernames before the DROP ROLE mutates mysql.role_edges/default_roles so the partial refresh can find them; collect those affected usernames (e.g., build an affectedUsers list from the current role edges/default-role owners) prior to performing the delete and pass that list into domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(...) instead of (or in addition to) the post-delete userList derived from userIdentityToUserList(s.UserList), ensuring NotifyUpdatePrivilege receives the pre-delete grantee usernames.
2222-2227:⚠️ Potential issue | 🟠 MajorInclude grantees/default-role owners when this path renames a role.
This notification only includes the old/new role names. When the renamed account is a role, users granted that role also changed effective privileges/default-role metadata, so their partial cache stays stale until a full reload. Append the affected
TO_USER/ default-role owner usernames toNotifyUpdatePrivilege(...)as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/simple.go` around lines 2222 - 2227, The current NotifyUpdatePrivilege call only passes renamed role usernames (users.OldUser.Username and users.NewUser.Username) but misses users who are grantees or default-role owners affected by a role rename; update the code in the s.UserToUsers loop (and the surrounding scope that builds userList) to also collect and append the usernames of TO_USER/default-role owners for any entry where the subject is a role (check the role marker on users.OldUser/users.NewUser or the relevant grantee list on the mapping object), then pass that combined list to domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(userList) so grantees' partial caches are refreshed too.pkg/privilege/privileges/cache.go (1)
531-567:⚠️ Potential issue | 🟠 MajorKeep partial refresh as tolerant as
UpdateAll().
UpdateAll()treats missingmysql.db,mysql.tables_priv,mysql.default_roles,mysql.columns_priv, andmysql.role_edgesas non-fatal, but the new targeted path hard-fails on the sameErrNoSuchTablecases because it callsloadTable(...)directly. That means enabling partial refresh can break privilege reloads in environments where a full reload still succeeds. Mirror the existingnoSuchTable(...)handling here or route through the existingLoad*helpers.Also applies to: 2297-2318
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/privilege/privileges/cache.go` around lines 531 - 567, The partial-refresh path in loadSomeUsers calls loadTable(...) directly and treats ErrNoSuchTable as fatal; mirror UpdateAll() behavior by making loadSomeUsers tolerate missing privilege tables: after each loadTable call (especially for sqlLoadDBTable, sqlLoadTablePrivTable, sqlLoadDefaultRoles, sqlLoadColumnsPrivTable and any role-related tables) check the error with noSuchTable(err) (or the same helper used elsewhere) and if true, ignore the error and continue, otherwise return errors.Trace(err); alternatively, replace the direct loadTable(...) calls with the existing LoadDB/LoadTablesPriv/LoadDefaultRoles/LoadColumnsPriv helpers that already handle ErrNoSuchTable to preserve the non-fatal behavior.
🧹 Nitpick comments (3)
pkg/session/bootstraptest/bootstrap_upgrade_test.go (1)
463-463: Minor: Extraneous leading space in SQL query.The SQL string has a leading space which, while harmless, is inconsistent with other SQL strings in the file.
Additionally, limiting to 20 jobs assumes the target job is among the most recent DDL jobs. In high-concurrency test environments, this could cause flaky failures if more than 20 DDL jobs are queued between job creation and this check.
Suggested fix
- sql := fmt.Sprintf(" admin show ddl jobs 20 where job_id=%d", jobID) + sql := fmt.Sprintf("admin show ddl jobs 20 where job_id=%d", jobID)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/session/bootstraptest/bootstrap_upgrade_test.go` at line 463, Remove the extraneous leading space in the sql string and avoid the arbitrary "20" limit which can cause flakiness; update the construction of the sql variable (currently: sql := fmt.Sprintf(" admin show ddl jobs 20 where job_id=%d", jobID)) to produce a query without the leading space and either omit the job-count limit or use a sufficiently large/unbounded form (e.g. "admin show ddl jobs where job_id=%d" or add a much larger limit) so the lookup for jobID is deterministic under high concurrency.pkg/sessionctx/variable/sysvar.go (1)
3557-3562: Add a short intent comment for this rollout flag.The setter is fine, but the knob’s operational contract is not obvious from the registration alone. A one-line comment clarifying that this gates the partial privilege/user cache update path would make future maintenance easier.
♻️ Suggested tweak
+ // TiDBAccelerateUserCreationUpdate enables the partial privilege/user cache update path. {Scope: ScopeGlobal, Name: TiDBAccelerateUserCreationUpdate, Value: BoolToOnOff(DefTiDBAccelerateUserCreationUpdate), Type: TypeBool, SetGlobal: func(_ context.Context, s *SessionVars, val string) error { AccelerateUserCreationUpdate.Store(TiDBOptOn(val)) return nil }, },As per coding guidelines, comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/sessionctx/variable/sysvar.go` around lines 3557 - 3562, Add a one-line intent comment above the TiDBAccelerateUserCreationUpdate registration explaining that this rollout flag gates the partial privilege/user-cache update path (i.e., when enabled the system will use AccelerateUserCreationUpdate.Store to toggle the optimized incremental update behavior for user/privilege changes); place the comment adjacent to the registration block so future readers of the TiDBAccelerateUserCreationUpdate and the SetGlobal setter can immediately see the operational contract and concurrency/compatibility implication.pkg/executor/show.go (1)
1796-1802: KeepSHOW CREATE USERdecoupled frommysql.userstorage details.Pulling
authentication_stringdirectly frommysql.usermakes the executor own privilege-table layout and account lookup rules. A small privilege-layer accessor or account snapshot would keep future auth-storage changes localized.Also applies to: 1875-1876
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/executor/show.go` around lines 1796 - 1802, The SHOW CREATE USER implementation currently reads authentication_string directly via ExecRestrictedSQL against mysql.SystemDB and mysql.UserTable (selecting authentication_string), coupling executor to privilege-table layout; replace this direct query with the privilege-layer accessor or account snapshot API used elsewhere (i.e., call the account lookup/getter that returns the Account/Authentication info) so SHOW CREATE USER consumes the abstraction rather than reading authentication_string directly; update both occurrences (the ExecRestrictedSQL call and the similar select at the other location) to use the accessor method and map its returned auth field into the SHOW CREATE USER output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@br/pkg/restore/snap_client/systable_restore.go`:
- Around line 389-397: The current loop only calls
rc.dom.NotifyUpdateAllUsersPrivilege() when table == "user"; update the
condition to call NotifyUpdateAllUsersPrivilege() for all privilege-related
tables (e.g., "user", "db", "tables_priv", "columns_priv", "default_roles",
"role_edges", "global_priv", "global_grants", and any other MySQL privilege
tables used in this codebase) so that after restoring any of those tables you
run the refresh; locate the loop over tables and replace the single-table check
with a membership test against this privilege-table set and reuse
rc.dom.NotifyUpdateAllUsersPrivilege() and the existing error handling/logging
branches (keeping the log messages but making them apply to any privilege table
restore).
In `@pkg/executor/test/passwordtest/password_management_test.go`:
- Around line 919-927: The helper changeAutoLockedLastChanged currently ignores
the return value of
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege(); change it to
capture the error (err :=
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege()) and surface it
the same way other call sites do (fail the test immediately on error — e.g. call
the test failure helper used elsewhere such as require.NoError(t, err) or
t.Fatalf("NotifyUpdateAllUsersPrivilege failed: %v", err) / equivalent via
testkit), so the test stops on a privilege-refresh failure instead of proceeding
with stale privileges.
---
Duplicate comments:
In `@pkg/domain/domain.go`:
- Around line 1872-1900: The loop over resp.Events must treat any empty-value
watch event as a full refresh: inside the resp.Events iteration in the code that
unmarshals into PrivilegeEvent, detect when event.Kv.Value is empty
(len(val)==0) and immediately set msg.All = true and break out of processing (or
continue to skip further per-event processing) so that mixed batches with legacy
empty events trigger a full reload; keep existing checks for tmp.All and the
skip for events from the same server (do.ServerID()) intact.
- Around line 3021-3024: When the serialized partial-update payload (variable
data) exceeds 1 MiB, avoid sending the oversized payload and logging its raw
contents; instead log only safe metrics (e.g., count of users and payload size)
and replace the etcd write with a compact full-refresh marker (e.g., a JSON with
All:true) before calling ddlutil.PutKVToEtcd; update the code around
privilegeKey/ddlutil.PutKVToEtcd (using do.ctx and do.etcdClient) to detect
uint64(len(data)) > size.MB, log counts/sizes via logutil.BgLogger().Warn
without the raw byte payload, set data to the serialized full-refresh message,
then call ddlutil.PutKVToEtcd as before.
- Around line 3015-3016: The code reads do.serverID directly when assigning
event.ServerID inside the do.etcdClient != nil block, mixing non-atomic reads
with atomic writes; change the direct access to use the existing atomic accessor
for serverID (i.e., replace event.ServerID = do.serverID with a call to the
atomic getter for serverID such as the package's atomic.LoadUint64(&do.serverID)
or the repository's provided do.GetServerID()/LoadServerID() method) so the read
is done atomically and self-emitted events are classified correctly.
In `@pkg/executor/infoschema_reader_test.go`:
- Around line 714-715: The test currently asserts a hard-coded total from
information_schema.tidb_index_usage; instead capture the baseline row count into
a variable (e.g. baseCnt) by running tk.MustQuery(`select count(*) from
information_schema.tidb_index_usage;`) before creating idt1/idt2, then after
creating them replace the literal check with an assertion that the count equals
baseCnt + 6; update the check at the spot referencing
information_schema.tidb_index_usage to use baseCnt + 6 so the test is
deterministic and immune to unrelated bootstrap/index additions.
In `@pkg/executor/simple.go`:
- Around line 2424-2425: In the s.IsDropRole branch you must snapshot the
grantee/default-role owner usernames before the DROP ROLE mutates
mysql.role_edges/default_roles so the partial refresh can find them; collect
those affected usernames (e.g., build an affectedUsers list from the current
role edges/default-role owners) prior to performing the delete and pass that
list into domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(...) instead of (or in
addition to) the post-delete userList derived from
userIdentityToUserList(s.UserList), ensuring NotifyUpdatePrivilege receives the
pre-delete grantee usernames.
- Around line 2222-2227: The current NotifyUpdatePrivilege call only passes
renamed role usernames (users.OldUser.Username and users.NewUser.Username) but
misses users who are grantees or default-role owners affected by a role rename;
update the code in the s.UserToUsers loop (and the surrounding scope that builds
userList) to also collect and append the usernames of TO_USER/default-role
owners for any entry where the subject is a role (check the role marker on
users.OldUser/users.NewUser or the relevant grantee list on the mapping object),
then pass that combined list to
domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(userList) so grantees' partial
caches are refreshed too.
In `@pkg/privilege/privileges/cache.go`:
- Around line 531-567: The partial-refresh path in loadSomeUsers calls
loadTable(...) directly and treats ErrNoSuchTable as fatal; mirror UpdateAll()
behavior by making loadSomeUsers tolerate missing privilege tables: after each
loadTable call (especially for sqlLoadDBTable, sqlLoadTablePrivTable,
sqlLoadDefaultRoles, sqlLoadColumnsPrivTable and any role-related tables) check
the error with noSuchTable(err) (or the same helper used elsewhere) and if true,
ignore the error and continue, otherwise return errors.Trace(err);
alternatively, replace the direct loadTable(...) calls with the existing
LoadDB/LoadTablesPriv/LoadDefaultRoles/LoadColumnsPriv helpers that already
handle ErrNoSuchTable to preserve the non-fatal behavior.
---
Nitpick comments:
In `@pkg/executor/show.go`:
- Around line 1796-1802: The SHOW CREATE USER implementation currently reads
authentication_string directly via ExecRestrictedSQL against mysql.SystemDB and
mysql.UserTable (selecting authentication_string), coupling executor to
privilege-table layout; replace this direct query with the privilege-layer
accessor or account snapshot API used elsewhere (i.e., call the account
lookup/getter that returns the Account/Authentication info) so SHOW CREATE USER
consumes the abstraction rather than reading authentication_string directly;
update both occurrences (the ExecRestrictedSQL call and the similar select at
the other location) to use the accessor method and map its returned auth field
into the SHOW CREATE USER output.
In `@pkg/session/bootstraptest/bootstrap_upgrade_test.go`:
- Line 463: Remove the extraneous leading space in the sql string and avoid the
arbitrary "20" limit which can cause flakiness; update the construction of the
sql variable (currently: sql := fmt.Sprintf(" admin show ddl jobs 20 where
job_id=%d", jobID)) to produce a query without the leading space and either omit
the job-count limit or use a sufficiently large/unbounded form (e.g. "admin show
ddl jobs where job_id=%d" or add a much larger limit) so the lookup for jobID is
deterministic under high concurrency.
In `@pkg/sessionctx/variable/sysvar.go`:
- Around line 3557-3562: Add a one-line intent comment above the
TiDBAccelerateUserCreationUpdate registration explaining that this rollout flag
gates the partial privilege/user-cache update path (i.e., when enabled the
system will use AccelerateUserCreationUpdate.Store to toggle the optimized
incremental update behavior for user/privilege changes); place the comment
adjacent to the registration block so future readers of the
TiDBAccelerateUserCreationUpdate and the SetGlobal setter can immediately see
the operational contract and concurrency/compatibility implication.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 7fe7cb0f-9ef4-4764-a3c1-e2212a56efb9
📒 Files selected for processing (29)
br/pkg/restore/snap_client/pipeline_items.gobr/pkg/restore/snap_client/systable_restore.gobr/pkg/restore/snap_client/systable_restore_test.gopkg/domain/BUILD.bazelpkg/domain/domain.gopkg/executor/grant.gopkg/executor/infoschema_reader_test.gopkg/executor/revoke.gopkg/executor/show.gopkg/executor/simple.gopkg/executor/test/passwordtest/password_management_test.gopkg/extension/auth_test.gopkg/privilege/privilege.gopkg/privilege/privileges/BUILD.bazelpkg/privilege/privileges/cache.gopkg/privilege/privileges/cache_test.gopkg/privilege/privileges/privileges.gopkg/privilege/privileges/privileges_test.gopkg/privilege/privileges/tidb_auth_token_test.gopkg/server/tests/commontest/tidb_test.gopkg/session/bootstrap.gopkg/session/bootstraptest/bootstrap_upgrade_test.gopkg/session/session.gopkg/sessionctx/variable/sysvar.gopkg/sessionctx/variable/tidb_vars.gotests/integrationtest/r/explain.resulttests/integrationtest/r/privilege/privileges.resulttests/integrationtest/t/explain.testtests/integrationtest/t/privilege/privileges.test
🚧 Files skipped from review as they are similar to previous changes (8)
- pkg/extension/auth_test.go
- pkg/domain/BUILD.bazel
- tests/integrationtest/t/explain.test
- br/pkg/restore/snap_client/pipeline_items.go
- pkg/privilege/privileges/BUILD.bazel
- pkg/server/tests/commontest/tidb_test.go
- pkg/executor/revoke.go
- pkg/privilege/privileges/privileges_test.go
|
/retest |
…-merge-release-8.5-20260311
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: wjhuang2016, yudongusa, YuJuncen The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
|
/retest |
1 similar comment
|
/retest |
…-merge-release-8.5-20260311-002
|
/retest |
8 similar comments
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
/retest |
|
@ti-chi-bot: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This is an automated cherry-pick of #62693
What problem does this PR solve?
Issue Number: ref #61706
Problem Summary:
What changed and how does it work?
Picked PR list:
user@''is different withuser@'%'| tidb-test=pr/2498 #60082But the 'active user' mechanism is later removed. We only need the 'partial update'.
Check List
Tests
grantorrevokeper secondtidb_accelerate_user_creation_updateto see the differenceSide effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Performance Enhancements
System Improvements
Tests