Skip to content

privilege: introduce 'partial update' for users and privileges cache (#62693) | tidb-test=pr/2700#66541

Merged
ti-chi-bot[bot] merged 3 commits into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-62693-to-release-8.5
Mar 11, 2026
Merged

privilege: introduce 'partial update' for users and privileges cache (#62693) | tidb-test=pr/2700#66541
ti-chi-bot[bot] merged 3 commits into
pingcap:release-8.5from
ti-chi-bot:cherry-pick-62693-to-release-8.5

Conversation

@ti-chi-bot

@ti-chi-bot ti-chi-bot commented Feb 26, 2026

Copy link
Copy Markdown
Member

This is an automated cherry-pick of #62693

What problem does this PR solve?

Issue Number: ref #61706

Problem Summary:

What changed and how does it work?

Picked PR list:

But the 'active user' mechanism is later removed. We only need the 'partial update'.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.
  1. bootstrap a cluster with ~100 users and ~130k column privilege records
  2. execute grant or revoke per second
  3. turn on/off tidb_accelerate_user_creation_update to see the difference
image image

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Summary by CodeRabbit

  • New Features

    • New global system variable tidb_accelerate_user_creation_update to accelerate user creation and privilege updates.
  • Performance Enhancements

    • Added indices on privilege-related system tables to speed permission and role lookups.
  • System Improvements

    • More targeted privilege refreshes (per-user or full reload) for faster, lower-impact permission updates.
    • Database bootstrap version bumped to v225.
  • Tests

    • Added and updated tests covering privilege flows and EXPLAIN outputs.

@ti-chi-bot ti-chi-bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Feb 26, 2026
@ti-chi-bot

Copy link
Copy Markdown
Member Author

@CbcWestwolf This PR has conflicts, I have hold it.
Please resolve them or ask others to resolve them, then comment /unhold to remove the hold label.

@ti-chi-bot

ti-chi-bot Bot commented Feb 26, 2026

Copy link
Copy Markdown

@ti-chi-bot: ## If you want to know how to resolve it, please read the guide in TiDB Dev Guide.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot ti-chi-bot Bot added cherry-pick-approved Cherry pick PR approved by release team. and removed do-not-merge/cherry-pick-not-approved labels Mar 3, 2026
@fzzf678 fzzf678 force-pushed the cherry-pick-62693-to-release-8.5 branch from d1aea63 to 60c9344 Compare March 10, 2026 15:18
@coderabbitai

coderabbitai Bot commented Mar 10, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Refactors privilege notification and reload to support targeted per-user updates and full reloads. Adds Domain.PrivilegeEvent parsing/batching, introduces NotifyUpdateAllUsersPrivilege, changes NotifyUpdatePrivilege to accept a user list, updates privilege cache for partial updates, adjusts call sites (executor/session/BR), and adds bootstrap indices and a sysvar.

Changes

Cohort / File(s) Summary
Domain Privilege Events & Build
pkg/domain/domain.go, pkg/domain/BUILD.bazel
Add PrivilegeEvent decoding, batching, and watcher logic; NotifyUpdatePrivilege([]string) now accepts user lists and NotifyUpdateAllUsersPrivilege() triggers full reload. Added //pkg/util/size dep.
Privilege API surface
pkg/privilege/privilege.go, pkg/privilege/privileges/privileges.go
Removed GetEncodedPassword; FindEdge signature removed ctx; ShowGrants/ActiveRoles param renamed; UserPrivileges call sites adapted.
Privilege cache & loaders
pkg/privilege/privileges/cache.go, pkg/privilege/privileges/*
Role graph keys changed to auth.RoleIdentity; Load* now take sqlexec.SQLExecutor; added Handle.Update(userList) + UpdateAll(), merge/updateUsers flows, and per-user filtered loads. Large structural refactor.
Executor call sites
pkg/executor/simple.go, pkg/executor/grant.go, pkg/executor/revoke.go, pkg/executor/show.go
Introduce helpers to derive username lists; propagate lists to NotifyUpdatePrivilege; adjust FindEdge usage and user existence/auth-plugin handling; SHOW CREATE USER reads authentication_string.
Session & Auth flows
pkg/session/session.go, pkg/server/tests/commontest/tidb_test.go
Auth/autolock now call NotifyUpdatePrivilege with explicit user lists; sandbox test uses Auth call.
BR restore integration
br/pkg/restore/snap_client/systable_restore.go, br/pkg/restore/snap_client/pipeline_items.go, br/pkg/restore/snap_client/systable_restore_test.go
When restoring mysql.user, call NotifyUpdateAllUsersPrivilege() (full reload). Tests updated to new bootstrap version.
Bootstrap / Schema indices
pkg/session/bootstrap.go, pkg/session/bootstraptest/bootstrap_upgrade_test.go
Bumped bootstrap to version225 and added upgradeToVer225 to create i_user indices on privilege tables; test DDL job query tightened.
Sysvar & flags
pkg/sessionctx/variable/sysvar.go, pkg/sessionctx/variable/tidb_vars.go
Add tidb_accelerate_user_creation_update sysvar and AccelerateUserCreationUpdate atomic flag for accelerated user-update behavior.
Tests & integration updates
pkg/privilege/privileges/*_test.go, pkg/executor/test/*, pkg/extension/auth_test.go, pkg/executor/infoschema_reader_test.go, tests/integrationtest/*
Updated tests for API/signature changes, RoleIdentity keys, expected counts, added accelerate-variable test, explain/privilege cases.
Build/test deps
pkg/privilege/privileges/BUILD.bazel, pkg/domain/BUILD.bazel
Added test dependency on //pkg/domain and //pkg/util/size.

Sequence Diagram(s)

sequenceDiagram
participant Client
participant Executor
participant Domain
participant Etcd as "etcd/watch"
participant PrivHandle as "Privilege Handle"
participant MySQLPriv as "MySQLPrivilege"

Client->>Executor: DDL/RESTORE that affects users
Executor->>Domain: NotifyUpdatePrivilege(userList) or NotifyUpdateAllUsersPrivilege()
Domain->>Etcd: emit/watch PrivilegeEvent (ServerID, userList/all)
Etcd-->>Domain: watch events -> decodePrivilegeEvent / batchReadMoreData
Domain->>PrivHandle: deliver PrivilegeEvent (Update or UpdateAll)
PrivHandle->>MySQLPriv: Update(userList) or UpdateAll()
MySQLPriv->>MySQLPriv: load relevant tables (LoadRoleGraph, LoadUserTable, ...)
MySQLPriv->>PrivHandle: merge updated data
PrivHandle-->>Domain: handle updated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

approved, lgtm, affects-8.5, contribution

Suggested reviewers

  • wjhuang2016
  • tiancaiamao
  • qw4990

Poem

🐰 I hop through RoleIdentity rows tonight,
Whisper usernames, then wake the cache alight.
Events batched, updates both small and tall,
From single-user pings to reloading all.
A joyful hop — privileges answer the call!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 17.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title clearly identifies the main change: introducing a 'partial update' mechanism for users and privileges cache, with the specific issue reference.
Description check ✅ Passed The PR description follows the template structure, includes the Issue Number reference (#61706), explains the changes (aggregating partial-update functionality from multiple PRs), documents testing performed (unit and manual tests), and includes performance demonstration images.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
br/pkg/restore/snap_client/systable_restore.go (1)

383-407: ⚠️ Potential issue | 🟠 Major

Reload privileges after restoring any privilege table, not just mysql.user.

tables can be filtered down to db, tables_priv, columns_priv, default_roles, role_edges, global_priv, or global_grants without user. In those restores this branch never refreshes the in-memory privilege cache, so the restored grants stay stale until a manual flush. Gate this on sysPrivilegeTableMap (or reuse notifyUpdateAllUsersPrivilege) instead of the literal "user".

💡 One way to make the reload condition complete
 func (rc *SnapClient) afterSystemTablesReplaced(ctx context.Context, db string, tables []string) error {
 	if db != mysql.SystemDB {
 		return nil
 	}
 
 	var err error
+	needPrivilegeReload := false
 	for _, table := range tables {
-		if table == "user" {
-			if serr := rc.dom.NotifyUpdateAllUsersPrivilege(); serr != nil {
-				log.Warn("failed to flush privileges, please manually execute `FLUSH PRIVILEGES`")
-				err = multierr.Append(err, berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to flush privileges"))
-			} else {
-				log.Info("privilege system table restored, please reconnect to make it effective")
-			}
-		} else if table == "bind_info" {
+		if _, ok := sysPrivilegeTableMap[table]; ok {
+			needPrivilegeReload = true
+			continue
+		}
+		if table == "bind_info" {
 			if serr := rc.db.Session().Execute(ctx, bindinfo.StmtRemoveDuplicatedPseudoBinding); serr != nil {
 				log.Warn("failed to delete duplicated pseudo binding", zap.Error(serr))
 				err = multierr.Append(err,
 					berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to delete duplicated pseudo binding %s", bindinfo.StmtRemoveDuplicatedPseudoBinding))
 			} else {
 				log.Info("success to remove duplicated pseudo binding")
 			}
 		}
 	}
+	if needPrivilegeReload {
+		if serr := rc.dom.NotifyUpdateAllUsersPrivilege(); serr != nil {
+			log.Warn("failed to flush privileges, please manually execute `FLUSH PRIVILEGES`")
+			err = multierr.Append(err, berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to flush privileges"))
+		} else {
+			log.Info("privilege system table restored, please reconnect to make it effective")
+		}
+	}
 	return err
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@br/pkg/restore/snap_client/systable_restore.go` around lines 383 - 407, The
current afterSystemTablesReplaced method only calls
rc.dom.NotifyUpdateAllUsersPrivilege() when the restored table name equals the
literal "user", which misses other privilege-related tables (db, tables_priv,
columns_priv, default_roles, role_edges, global_priv, global_grants) and leaves
in-memory grants stale; update the condition to check whether any restored table
is in the privilege table set (use sysPrivilegeTableMap or a helper like
notifyUpdateAllUsersPrivilege) and call rc.dom.NotifyUpdateAllUsersPrivilege()
when any privilege table is present (preserving existing logging and error
aggregation behavior in afterSystemTablesReplaced) so all privilege restores
trigger a refresh.
pkg/executor/test/passwordtest/password_management_test.go (1)

919-928: ⚠️ Potential issue | 🟡 Minor

Assert the refresh error in changeAutoLockedLastChanged.

This helper updates mysql.user and then refreshes the privilege cache. If NotifyUpdateAllUsersPrivilege() fails here, the test silently continues with stale state and the later auth assertions can fail for the wrong reason.

🧪 Make the helper return the refresh error
-func changeAutoLockedLastChanged(tk *testkit.TestKit, ds, user string) {
+func changeAutoLockedLastChanged(tk *testkit.TestKit, ds, user string) error {
 	SQL := "UPDATE `mysql`.`User` SET user_attributes=json_merge_patch(user_attributes, '{\"Password_locking\": {\"failed_login_attempts\": 3," +
 		"\"password_lock_time_days\": 3,\"auto_account_locked\": \"Y\",\"failed_login_count\": 3,\"auto_locked_last_changed\": \"%s\"}}') " +
 		"WHERE Host='localhost' and User='%s'"
 	d, _ := time.ParseDuration(ds)
 	changeTime := time.Now().Add(d).Format(time.UnixDate)
 	SQL = fmt.Sprintf(SQL, changeTime, user)
 	tk.MustExec(SQL)
-	domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege()
+	return domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege()
 }

Then have each call site assert require.NoError(t, changeAutoLockedLastChanged(...)). As per coding guidelines "Keep error handling actionable and contextual; avoid silently swallowing errors in Go code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/test/passwordtest/password_management_test.go` around lines 919
- 928, changeAutoLockedLastChanged currently swallows any error from the
privilege refresh; modify changeAutoLockedLastChanged to return an error and
propagate any failure from
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() (i.e., change
signature from func changeAutoLockedLastChanged(...) to func
changeAutoLockedLastChanged(...) error and return the error if
NotifyUpdateAllUsersPrivilege() fails after tk.MustExec), and then update every
call site to assert the result (e.g., require.NoError(t,
changeAutoLockedLastChanged(...))). Ensure function still performs the SQL
update via tk.MustExec and only returns nil on successful
NotifyUpdateAllUsersPrivilege().
pkg/executor/show.go (1)

1876-1901: ⚠️ Potential issue | 🟡 Minor

Escape authentication_string before embedding it in the DDL.

authData now comes straight from mysql.user. If it contains ' or \, SHOW CREATE USER will emit invalid SQL for IDENTIFIED ... AS ....

Suggested fix
 	authStr := ""
 	if !(authplugin == mysql.AuthSocket && authData == "") {
-		authStr = fmt.Sprintf(" AS '%s'", authData)
+		authStr = fmt.Sprintf(" AS '%s'", format.OutputFormat(authData))
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/show.go` around lines 1876 - 1901, The authData pulled into the
DDL isn't escaped and can break the generated SQL; before building authStr in
show.go (where authData, authplugin and showStr are composed), properly escape
authentication_string characters (at minimum single quote and backslash, and any
other SQL-string-special bytes used by your SQL dialect) or use an existing
SQL/string-quoting helper to produce a safely quoted literal, then embed that
escaped/quoted value into authStr (preserving the existing skip for
mysql.AuthSocket && authData == ""); update the construction that uses
fmt.Sprintf for showStr to use the escaped/quoted authData instead of the raw
authData.
🧹 Nitpick comments (4)
pkg/extension/auth_test.go (1)

238-238: Avoid asserting the total mock count across the whole test.

This expectation is coupled to every earlier ValidateAuthString side effect in the test, so unrelated auth/cache changes will keep forcing churn here. Prefer asserting the delta for the login step, or reset the mock state before this section.

As per coding guidelines, **/{*_test.go,testdata/**}: Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/extension/auth_test.go` at line 238, The test currently asserts total
calls with p.AssertNumberOfCalls(t, "ValidateAuthString", 3) which couples this
assertion to prior side effects; instead, either reset or clear the mock before
the login section (so subsequent calls can be asserted deterministically) or
replace the global count assertion with a delta-style check that only verifies
the number of ValidateAuthString calls made during the login step (e.g., capture
the current call count from the mock, perform the login actions, then assert the
increment equals the expected delta). Target the mock method name
ValidateAuthString and the assertion call p.AssertNumberOfCalls to implement the
change.
pkg/sessionctx/variable/tidb_vars.go (1)

1033-1034: Document the trade-off behind this switch a bit more explicitly.

The current comment says what flips, but not why someone would enable it or why the default is off. A short note that this trades higher in-memory user-cache footprint for faster privilege/user refreshes would make the knob much easier to reason about.

As per coding guidelines, **/*.go: Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sessionctx/variable/tidb_vars.go` around lines 1033 - 1034, Update the
comment for the TiDBAccelerateUserCreationUpdate constant to explain the
trade-off: when enabled (TiDBAccelerateUserCreationUpdate) the server eagerly
loads and updates whole user data in-memory to speed up privilege/user
refreshes, at the cost of increased memory footprint for the user cache; clarify
that the default is off to avoid higher memory usage in typical deployments and
note any relevant invariants (e.g., affects in-memory user-cache size and
refresh latency).
pkg/sessionctx/variable/sysvar.go (1)

3557-3562: Consider adding GetGlobal for consistency with other atomic-backed global variables.

The variable uses AccelerateUserCreationUpdate.Store() in SetGlobal, but lacks a corresponding GetGlobal. Other similar global boolean variables with atomic backing (e.g., TiDBEnableBatchDML, TiDBEnableCheckConstraint, TiDBEnableResourceControl) follow the pattern of having both SetGlobal and GetGlobal to ensure reads reflect the in-memory atomic state.

Proposed fix
 {Scope: ScopeGlobal, Name: TiDBAccelerateUserCreationUpdate, Value: BoolToOnOff(DefTiDBAccelerateUserCreationUpdate), Type: TypeBool,
     SetGlobal: func(_ context.Context, s *SessionVars, val string) error {
         AccelerateUserCreationUpdate.Store(TiDBOptOn(val))
         return nil
     },
+    GetGlobal: func(_ context.Context, s *SessionVars) (string, error) {
+        return BoolToOnOff(AccelerateUserCreationUpdate.Load()), nil
+    },
 },
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sessionctx/variable/sysvar.go` around lines 3557 - 3562, Add a GetGlobal
to mirror the existing SetGlobal for the TiDBAccelerateUserCreationUpdate
variable so reads reflect the in-memory atomic state: implement a GetGlobal
function for the variable that calls AccelerateUserCreationUpdate.Load(),
converts the boolean to the string form with BoolToOnOff (matching the existing
Value/SetGlobal pattern), and returns that string (with nil error) so GetGlobal
and SetGlobal are consistent with other atomic-backed globals like
TiDBEnableBatchDML/TiDBEnableCheckConstraint/TiDBEnableResourceControl.
pkg/session/bootstrap.go (1)

388-389: Drop the redundant mysql.global_grants secondary index.

mysql.global_grants already has PRIMARY KEY (USER, HOST, PRIV), so i_user(USER) does not add a new access path; it just adds extra DDL, storage, and write overhead on grant changes.

♻️ Suggested cleanup
-		PRIMARY KEY (USER,HOST,PRIV),
-		KEY i_user (USER)
+		PRIMARY KEY (USER,HOST,PRIV)
 	);`
-	doReentrantDDL(s, "ALTER TABLE mysql.global_grants ADD INDEX i_user (user)", dbterror.ErrDupKeyName)

Also applies to: 3340-3340

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/session/bootstrap.go` around lines 388 - 389, The DDL defines a redundant
secondary index KEY i_user (USER) on mysql.global_grants which duplicates the
PRIMARY KEY (USER,HOST,PRIV); remove the secondary index declaration to avoid
extra DDL/storage/write overhead. Edit the CREATE/ALTER statement(s) in
pkg/session/bootstrap.go that reference mysql.global_grants and delete the "KEY
i_user (USER)" entry (and any identical occurrences elsewhere in that file,
e.g., the other repeated definition), leaving only the PRIMARY KEY
(USER,HOST,PRIV).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/domain/domain.go`:
- Around line 1869-1901: The function decodePrivilegeEvent must treat any watch
event with an empty value as a full refresh; modify decodePrivilegeEvent so that
when iterating resp.Events, if event.Kv != nil and len(event.Kv.Value) == 0 you
set msg.All = true and immediately return msg (or break out and ensure
msg.All=true is honored) instead of relying on isNewVersionEvents across the
whole batch; update the loop around event.Kv/val handling (variables: resp
clientv3.WatchResponse, msg PrivilegeEvent, isNewVersionEvents, tmp
PrivilegeEvent) so a single legacy empty event triggers a full reload for other
nodes.
- Around line 3015-3017: The code reads do.serverID directly while
serverIDKeeper writes it with atomic.StoreUint64, causing a
race/misclassification; replace the direct read with an atomic load (use
atomic.LoadUint64 on the same serverID field) when assigning event.ServerID so
the access is consistent with serverIDKeeper's atomic.StoreUint64 usage.
- Around line 3021-3024: The current code tries to PutKVToEtcd with a payload
that may exceed size.MB, risking etcd failure and leaking full username lists to
logs; change the branch where uint64(len(data)) > size.MB to instead replace
data with a small "full-refresh" notification (e.g., a marshaled struct with
All:true) and log a warning without including the raw byte payload; then call
ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data)) with
the reduced payload. Ensure you reference the same symbols (size.MB,
privilegeKey, do.ctx, do.etcdClient, ddlutil.PutKVToEtcd) and do not emit the
large data in logs.

In `@pkg/executor/infoschema_reader_test.go`:
- Around line 714-715: The test currently asserts an absolute count (87) from
the information_schema.tidb_index_usage table which is brittle; change the
assertion to compute a baseline count from tk.MustQuery("select count(*) from
information_schema.tidb_index_usage;") before the index-related operations and
then assert baseline+6 instead of the hard-coded 87. Locate the tk.MustQuery
call in the test, capture the returned count into a variable, convert to an
integer, and use that variable+6 in the final Check call (keep the query text
"select count(*) from information_schema.tidb_index_usage;" and the
testkit.Check call usage intact).

In `@pkg/privilege/privileges/cache.go`:
- Around line 502-528: findUserAndAllRoles only traverses roleGraph and thus
misses grantees affected by changes to mysql.default_roles; update this function
to also include reverse default_roles dependents by accepting or accessing the
reverse-default-roles mapping (e.g. a map from role -> []grantee) and enqueueing
those grantees just like roleGraph edges so they enter the returned set, or if
you cannot access that map here, make callers that handle default_roles changes
call UpdateAll() instead; reference findUserAndAllRoles, roleGraph,
default_roles/defaultRoles and UpdateAll when implementing the change.

In `@pkg/privilege/privileges/privileges_test.go`:
- Around line 2169-2202: The test TestSQLVariableAccelerateUserCreationUpdate
mutates the global flag tidb_accelerate_user_creation_update without guaranteed
restoration on failure; capture the current state at the start (e.g. via
variable.AccelerateUserCreationUpdate.Load() or querying
@@global.tidb_accelerate_user_creation_update) and register a t.Cleanup that
restores the original value (either by calling tk.MustExec("set
@@global.tidb_accelerate_user_creation_update = ...") or by using
variable.AccelerateUserCreationUpdate.Store(...)) so the flag is returned to its
prior state even if the test fails; keep the rest of the test logic (create
user, set on/off) unchanged and reference the
TestSQLVariableAccelerateUserCreationUpdate and
variable.AccelerateUserCreationUpdate symbols when making the change.

In `@pkg/server/tests/commontest/tidb_test.go`:
- Line 2669: The call to qctx.Session.Auth(...) ignores the returned error;
capture it (err := qctx.Session.Auth(...)) and fail the test on error (for
example: if err != nil { t.Fatalf("Auth failed: %v", err) } or use your test
framework's require/assert failure), so the test stops with a clear message if
authentication fails rather than continuing with a misconfigured session.

In `@tests/integrationtest/t/privilege/privileges.test`:
- Around line 903-905: The test setup misses pre-cleaning the role 'aa@bb', so
add a DROP ROLE IF EXISTS 'aa@bb' before CREATE ROLE 'aa@bb' in the
privileges.test setup to make the test self-contained and deterministic; locate
the SQL block in tests/integrationtest/t/privilege/privileges.test around the
lines that run "drop user if exists u1; create user u1; create role 'aa@bb';"
and insert the matching "DROP ROLE IF EXISTS 'aa@bb';" immediately before the
CREATE ROLE statement.

---

Outside diff comments:
In `@br/pkg/restore/snap_client/systable_restore.go`:
- Around line 383-407: The current afterSystemTablesReplaced method only calls
rc.dom.NotifyUpdateAllUsersPrivilege() when the restored table name equals the
literal "user", which misses other privilege-related tables (db, tables_priv,
columns_priv, default_roles, role_edges, global_priv, global_grants) and leaves
in-memory grants stale; update the condition to check whether any restored table
is in the privilege table set (use sysPrivilegeTableMap or a helper like
notifyUpdateAllUsersPrivilege) and call rc.dom.NotifyUpdateAllUsersPrivilege()
when any privilege table is present (preserving existing logging and error
aggregation behavior in afterSystemTablesReplaced) so all privilege restores
trigger a refresh.

In `@pkg/executor/show.go`:
- Around line 1876-1901: The authData pulled into the DDL isn't escaped and can
break the generated SQL; before building authStr in show.go (where authData,
authplugin and showStr are composed), properly escape authentication_string
characters (at minimum single quote and backslash, and any other
SQL-string-special bytes used by your SQL dialect) or use an existing
SQL/string-quoting helper to produce a safely quoted literal, then embed that
escaped/quoted value into authStr (preserving the existing skip for
mysql.AuthSocket && authData == ""); update the construction that uses
fmt.Sprintf for showStr to use the escaped/quoted authData instead of the raw
authData.

In `@pkg/executor/test/passwordtest/password_management_test.go`:
- Around line 919-928: changeAutoLockedLastChanged currently swallows any error
from the privilege refresh; modify changeAutoLockedLastChanged to return an
error and propagate any failure from
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() (i.e., change
signature from func changeAutoLockedLastChanged(...) to func
changeAutoLockedLastChanged(...) error and return the error if
NotifyUpdateAllUsersPrivilege() fails after tk.MustExec), and then update every
call site to assert the result (e.g., require.NoError(t,
changeAutoLockedLastChanged(...))). Ensure function still performs the SQL
update via tk.MustExec and only returns nil on successful
NotifyUpdateAllUsersPrivilege().

---

Nitpick comments:
In `@pkg/extension/auth_test.go`:
- Line 238: The test currently asserts total calls with p.AssertNumberOfCalls(t,
"ValidateAuthString", 3) which couples this assertion to prior side effects;
instead, either reset or clear the mock before the login section (so subsequent
calls can be asserted deterministically) or replace the global count assertion
with a delta-style check that only verifies the number of ValidateAuthString
calls made during the login step (e.g., capture the current call count from the
mock, perform the login actions, then assert the increment equals the expected
delta). Target the mock method name ValidateAuthString and the assertion call
p.AssertNumberOfCalls to implement the change.

In `@pkg/session/bootstrap.go`:
- Around line 388-389: The DDL defines a redundant secondary index KEY i_user
(USER) on mysql.global_grants which duplicates the PRIMARY KEY (USER,HOST,PRIV);
remove the secondary index declaration to avoid extra DDL/storage/write
overhead. Edit the CREATE/ALTER statement(s) in pkg/session/bootstrap.go that
reference mysql.global_grants and delete the "KEY i_user (USER)" entry (and any
identical occurrences elsewhere in that file, e.g., the other repeated
definition), leaving only the PRIMARY KEY (USER,HOST,PRIV).

In `@pkg/sessionctx/variable/sysvar.go`:
- Around line 3557-3562: Add a GetGlobal to mirror the existing SetGlobal for
the TiDBAccelerateUserCreationUpdate variable so reads reflect the in-memory
atomic state: implement a GetGlobal function for the variable that calls
AccelerateUserCreationUpdate.Load(), converts the boolean to the string form
with BoolToOnOff (matching the existing Value/SetGlobal pattern), and returns
that string (with nil error) so GetGlobal and SetGlobal are consistent with
other atomic-backed globals like
TiDBEnableBatchDML/TiDBEnableCheckConstraint/TiDBEnableResourceControl.

In `@pkg/sessionctx/variable/tidb_vars.go`:
- Around line 1033-1034: Update the comment for the
TiDBAccelerateUserCreationUpdate constant to explain the trade-off: when enabled
(TiDBAccelerateUserCreationUpdate) the server eagerly loads and updates whole
user data in-memory to speed up privilege/user refreshes, at the cost of
increased memory footprint for the user cache; clarify that the default is off
to avoid higher memory usage in typical deployments and note any relevant
invariants (e.g., affects in-memory user-cache size and refresh latency).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 5e46b9ec-a05c-4917-afd1-7d6690571a73

📥 Commits

Reviewing files that changed from the base of the PR and between 9da0bd1 and 60c9344.

📒 Files selected for processing (28)
  • br/pkg/restore/snap_client/systable_restore.go
  • br/pkg/restore/snap_client/systable_restore_test.go
  • pkg/domain/BUILD.bazel
  • pkg/domain/domain.go
  • pkg/executor/grant.go
  • pkg/executor/infoschema_reader_test.go
  • pkg/executor/revoke.go
  • pkg/executor/show.go
  • pkg/executor/simple.go
  • pkg/executor/test/passwordtest/password_management_test.go
  • pkg/extension/auth_test.go
  • pkg/privilege/privilege.go
  • pkg/privilege/privileges/BUILD.bazel
  • pkg/privilege/privileges/cache.go
  • pkg/privilege/privileges/cache_test.go
  • pkg/privilege/privileges/privileges.go
  • pkg/privilege/privileges/privileges_test.go
  • pkg/privilege/privileges/tidb_auth_token_test.go
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/session/bootstrap.go
  • pkg/session/bootstraptest/bootstrap_upgrade_test.go
  • pkg/session/session.go
  • pkg/sessionctx/variable/sysvar.go
  • pkg/sessionctx/variable/tidb_vars.go
  • tests/integrationtest/r/explain.result
  • tests/integrationtest/r/privilege/privileges.result
  • tests/integrationtest/t/explain.test
  • tests/integrationtest/t/privilege/privileges.test

Comment thread pkg/domain/domain.go
Comment thread pkg/domain/domain.go
Comment on lines 3015 to 3017
if do.etcdClient != nil {
event := PrivilegeEvent{ServerID: do.serverID}
event.ServerID = do.serverID
data, err := json.Marshal(event)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Read serverID through the atomic accessor.

serverIDKeeper writes this field with atomic.StoreUint64, so the direct read here mixes atomic and non-atomic access and can misclassify self-emitted events.

🩹 Minimal fix
-		event.ServerID = do.serverID
+		event.ServerID = do.ServerID()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if do.etcdClient != nil {
event := PrivilegeEvent{ServerID: do.serverID}
event.ServerID = do.serverID
data, err := json.Marshal(event)
if do.etcdClient != nil {
event.ServerID = do.ServerID()
data, err := json.Marshal(event)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 3015 - 3017, The code reads do.serverID
directly while serverIDKeeper writes it with atomic.StoreUint64, causing a
race/misclassification; replace the direct read with an atomic load (use
atomic.LoadUint64 on the same serverID field) when assigning event.ServerID so
the access is consistent with serverIDKeeper's atomic.StoreUint64 usage.

Comment thread pkg/domain/domain.go
Comment on lines +3021 to 3024
if uint64(len(data)) > size.MB {
logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data))
}
err = ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Downgrade oversized partial notifications to a full-refresh event.

If this payload is already above the warning threshold, PutKVToEtcd can fail and only the local TiDB refreshes. Falling back to {All:true} keeps cross-node propagation reliable and avoids dumping the full username list into logs.

🩹 Safer fallback
-		if uint64(len(data)) > size.MB {
-			logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data))
-		}
+		if uint64(len(data)) > size.MB {
+			origBytes := len(data)
+			userCount := len(event.UserList)
+			event = PrivilegeEvent{All: true, ServerID: event.ServerID}
+			data, err = json.Marshal(event)
+			if err != nil {
+				return errors.Trace(err)
+			}
+			logutil.BgLogger().Warn("notify update privilege message too large, falling back to full reload",
+				zap.Int("user_count", userCount),
+				zap.Int("bytes", origBytes))
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if uint64(len(data)) > size.MB {
logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data))
}
err = ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data))
if uint64(len(data)) > size.MB {
origBytes := len(data)
userCount := len(event.UserList)
event = PrivilegeEvent{All: true, ServerID: event.ServerID}
data, err = json.Marshal(event)
if err != nil {
return errors.Trace(err)
}
logutil.BgLogger().Warn("notify update privilege message too large, falling back to full reload",
zap.Int("user_count", userCount),
zap.Int("bytes", origBytes))
}
err = ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey, string(data))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 3021 - 3024, The current code tries to
PutKVToEtcd with a payload that may exceed size.MB, risking etcd failure and
leaking full username lists to logs; change the branch where uint64(len(data)) >
size.MB to instead replace data with a small "full-refresh" notification (e.g.,
a marshaled struct with All:true) and log a warning without including the raw
byte payload; then call ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5,
privilegeKey, string(data)) with the reduced payload. Ensure you reference the
same symbols (size.MB, privilegeKey, do.ctx, do.etcdClient, ddlutil.PutKVToEtcd)
and do not emit the large data in logs.

Comment thread pkg/executor/infoschema_reader_test.go Outdated
Comment on lines +714 to +715
tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`).Check(
testkit.RowsWithSep("|", "81"))
testkit.RowsWithSep("|", "87"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid hard-coding the global tidb_index_usage row count.

This 87 will keep changing whenever unrelated system-table indexes are added, which is exactly what happened in this PR. Since the test already proves the six idt1/idt2 rows explicitly, a more stable assertion here is “baseline count + 6” instead of the absolute global total.

Suggested change
 func TestIndexUsageTable(t *testing.T) {
 	store := testkit.CreateMockStore(t)
 	tk := testkit.NewTestKit(t, store)
 	tk.MustExec("use test")
+	baseCnt, err := strconv.Atoi(
+		tk.MustQuery(`select count(*) from information_schema.tidb_index_usage`).Rows()[0][0].(string),
+	)
+	require.NoError(t, err)
 	tk.MustExec("create table idt1(col_1 int primary key, col_2 int, index idx_1(col_1), index idx_2(col_2), index idx_3(col_1, col_2));")
 	tk.MustExec("create table idt2(col_1 int primary key, col_2 int, index idx_1(col_1), index idx_2(col_2), index idx_4(col_2, col_1));")
@@
-	tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`).Check(
-		testkit.RowsWithSep("|", "87"))
+	cnt, err := strconv.Atoi(
+		tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`).Rows()[0][0].(string),
+	)
+	require.NoError(t, err)
+	require.Equal(t, baseCnt+6, cnt)

As per coding guidelines, "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/infoschema_reader_test.go` around lines 714 - 715, The test
currently asserts an absolute count (87) from the
information_schema.tidb_index_usage table which is brittle; change the assertion
to compute a baseline count from tk.MustQuery("select count(*) from
information_schema.tidb_index_usage;") before the index-related operations and
then assert baseline+6 instead of the hard-coded 87. Locate the tk.MustQuery
call in the test, capture the returned count into a variable, convert to an
integer, and use that variable+6 in the final Check call (keep the query text
"select count(*) from information_schema.tidb_index_usage;" and the
testkit.Check call usage intact).

Comment thread pkg/privilege/privileges/cache.go
Comment on lines +2169 to +2202
func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) {
store := createStoreAndPrepareDB(t)
tk := testkit.NewTestKit(t, store)
dom := domain.GetDomain(tk.Session())
// 1. check the default variable value
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0"))
// trigger priv reload
tk.MustExec("create user aaa")
handle := dom.PrivilegeHandle()
priv := handle.Get()
require.False(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))

// 2. change the variable and check
tk.MustExec("set @@global.tidb_accelerate_user_creation_update = on")
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("1"))
require.True(t, variable.AccelerateUserCreationUpdate.Load())
tk.MustExec("create user bbb")
// trigger priv reload, but data for bbb is not really loaded
tk.MustExec("grant select on test.* to bbb")
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
tk1 := testkit.NewTestKit(t, store)
// if user bbb login, everything works as expected
require.NoError(t, tk1.Session().Auth(&auth.UserIdentity{Username: "bbb", Hostname: "localhost"}, nil, nil, nil))
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))

// 3. change the variable and check again
tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off")
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0"))
tk.MustExec("drop user aaa")
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Restore the global accelerate flag with t.Cleanup.

This test only flips tidb_accelerate_user_creation_update back to off on the happy path. Any earlier require.* failure leaves the process-wide flag changed for later tests.

Suggested fix
 func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) {
 	store := createStoreAndPrepareDB(t)
 	tk := testkit.NewTestKit(t, store)
+	t.Cleanup(func() {
+		tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off")
+	})
 	dom := domain.GetDomain(tk.Session())

As per coding guidelines, **/{*_test.go,testdata/**}: Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) {
store := createStoreAndPrepareDB(t)
tk := testkit.NewTestKit(t, store)
dom := domain.GetDomain(tk.Session())
// 1. check the default variable value
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0"))
// trigger priv reload
tk.MustExec("create user aaa")
handle := dom.PrivilegeHandle()
priv := handle.Get()
require.False(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
// 2. change the variable and check
tk.MustExec("set @@global.tidb_accelerate_user_creation_update = on")
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("1"))
require.True(t, variable.AccelerateUserCreationUpdate.Load())
tk.MustExec("create user bbb")
// trigger priv reload, but data for bbb is not really loaded
tk.MustExec("grant select on test.* to bbb")
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
tk1 := testkit.NewTestKit(t, store)
// if user bbb login, everything works as expected
require.NoError(t, tk1.Session().Auth(&auth.UserIdentity{Username: "bbb", Hostname: "localhost"}, nil, nil, nil))
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
// 3. change the variable and check again
tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off")
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0"))
tk.MustExec("drop user aaa")
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
}
func TestSQLVariableAccelerateUserCreationUpdate(t *testing.T) {
store := createStoreAndPrepareDB(t)
tk := testkit.NewTestKit(t, store)
t.Cleanup(func() {
tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off")
})
dom := domain.GetDomain(tk.Session())
// 1. check the default variable value
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0"))
// trigger priv reload
tk.MustExec("create user aaa")
handle := dom.PrivilegeHandle()
priv := handle.Get()
require.False(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
// 2. change the variable and check
tk.MustExec("set @@global.tidb_accelerate_user_creation_update = on")
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("1"))
require.True(t, variable.AccelerateUserCreationUpdate.Load())
tk.MustExec("create user bbb")
// trigger priv reload, but data for bbb is not really loaded
tk.MustExec("grant select on test.* to bbb")
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
tk1 := testkit.NewTestKit(t, store)
// if user bbb login, everything works as expected
require.NoError(t, tk1.Session().Auth(&auth.UserIdentity{Username: "bbb", Hostname: "localhost"}, nil, nil, nil))
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
// 3. change the variable and check again
tk.MustExec("set @@global.tidb_accelerate_user_creation_update = off")
tk.MustQuery("select @@global.tidb_accelerate_user_creation_update").Check(testkit.Rows("0"))
tk.MustExec("drop user aaa")
priv = handle.Get()
require.True(t, priv.RequestVerification(nil, "bbb", "%", "test", "", "", mysql.SelectPriv))
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/privilege/privileges/privileges_test.go` around lines 2169 - 2202, The
test TestSQLVariableAccelerateUserCreationUpdate mutates the global flag
tidb_accelerate_user_creation_update without guaranteed restoration on failure;
capture the current state at the start (e.g. via
variable.AccelerateUserCreationUpdate.Load() or querying
@@global.tidb_accelerate_user_creation_update) and register a t.Cleanup that
restores the original value (either by calling tk.MustExec("set
@@global.tidb_accelerate_user_creation_update = ...") or by using
variable.AccelerateUserCreationUpdate.Store(...)) so the flag is returned to its
prior state even if the test fails; keep the rest of the test logic (create
user, set on/off) unchanged and reference the
TestSQLVariableAccelerateUserCreationUpdate and
variable.AccelerateUserCreationUpdate symbols when making the change.

_, err = Execute(context.Background(), qctx, "create user testuser;")
require.NoError(t, err)
qctx.Session.GetSessionVars().User = &auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}
qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing error check for Auth() call.

The Auth() method returns an error that is being silently discarded. If authentication fails, the test will continue with an incorrectly configured session, causing confusing failures in subsequent assertions. As per coding guidelines, avoid silently swallowing errors in Go code.

🔧 Proposed fix to check the error
-	qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil)
+	err = qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil)
+	require.NoError(t, err)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil)
err = qctx.Session.Auth(&auth.UserIdentity{Username: "testuser", AuthUsername: "testuser", AuthHostname: "%"}, nil, nil, nil)
require.NoError(t, err)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/server/tests/commontest/tidb_test.go` at line 2669, The call to
qctx.Session.Auth(...) ignores the returned error; capture it (err :=
qctx.Session.Auth(...)) and fail the test on error (for example: if err != nil {
t.Fatalf("Auth failed: %v", err) } or use your test framework's require/assert
failure), so the test stops with a clear message if authentication fails rather
than continuing with a misconfigured session.

Comment on lines +903 to +905
drop user if exists u1;
create user u1;
create role 'aa@bb';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Pre-clean the role before creating it.

u1 is cleaned up up front, but 'aa@bb' is not. If a prior run aborts before teardown, CREATE ROLE 'aa@bb' will fail on rerun. Add a matching DROP ROLE IF EXISTS here to keep the case self-contained.

♻️ Suggested setup hardening
 # TestIssue59552
 drop user if exists u1;
+drop role if exists 'aa@bb';
 create user u1;
 create role 'aa@bb';
As per coding guidelines, `**/{*_test.go,testdata/**}`: Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
drop user if exists u1;
create user u1;
create role 'aa@bb';
drop user if exists u1;
drop role if exists 'aa@bb';
create user u1;
create role 'aa@bb';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integrationtest/t/privilege/privileges.test` around lines 903 - 905,
The test setup misses pre-cleaning the role 'aa@bb', so add a DROP ROLE IF
EXISTS 'aa@bb' before CREATE ROLE 'aa@bb' in the privileges.test setup to make
the test self-contained and deterministic; locate the SQL block in
tests/integrationtest/t/privilege/privileges.test around the lines that run
"drop user if exists u1; create user u1; create role 'aa@bb';" and insert the
matching "DROP ROLE IF EXISTS 'aa@bb';" immediately before the CREATE ROLE
statement.

@fzzf678 fzzf678 force-pushed the cherry-pick-62693-to-release-8.5 branch from 60c9344 to a0647dd Compare March 10, 2026 15:56

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/executor/test/passwordtest/password_management_test.go (1)

919-928: ⚠️ Potential issue | 🟡 Minor

Don't ignore the privilege refresh failure in this helper.

If NotifyUpdateAllUsersPrivilege() fails here, the test keeps running against stale in-memory privileges and can become flaky. Return the error or pass *testing.T into the helper and assert it.

As per coding guidelines "Keep error handling actionable and contextual; avoid silently swallowing errors in Go code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/test/passwordtest/password_management_test.go` around lines 919
- 928, The helper changeAutoLockedLastChanged currently calls
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() and ignores its
error; modify the helper to surface and handle that failure—either change
changeAutoLockedLastChanged to return error and propagate it to the caller
(check and fail the test), or accept a *testing.T parameter and assert the call
succeeds (e.g., require.NoError/ t.Fatalf on failure). Ensure the updated
function signature and all callers are updated accordingly so
NotifyUpdateAllUsersPrivilege() errors are not silently swallowed.
♻️ Duplicate comments (4)
pkg/domain/domain.go (3)

3015-3017: ⚠️ Potential issue | 🟠 Major

Read serverID through the atomic accessor.

serverID is written with atomic stores elsewhere, so this direct read can race and misclassify a self-emitted privilege notification as remote.

🩹 Minimal fix
-		event.ServerID = do.serverID
+		event.ServerID = do.ServerID()

Run this to confirm the mixed atomic/non-atomic access pattern:

#!/bin/bash
rg -n -C2 'event\.ServerID = do\.serverID|atomic\.StoreUint64\(&do\.serverID|func \(do \*Domain\) ServerID\(' pkg/domain/domain.go
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 3015 - 3017, The code sets event.ServerID
by directly reading do.serverID which races with atomic stores; replace the
direct field access with the existing atomic accessor (call the
Domain.ServerID() method or the atomic load wrapper) so event.ServerID =
do.ServerID() (or the accessor name in this file) to ensure a safe atomic read;
update the assignment in the block that checks do.etcdClient != nil to use that
accessor instead of do.serverID.

1869-1903: ⚠️ Potential issue | 🟠 Major

Treat any empty-value watch event as a full reload.

A single legacy empty-value event can arrive in the same WatchResponse as new-format JSON events during a rolling upgrade. This implementation only falls back to All=true when none of the events carried JSON, so that legacy change is silently dropped and other TiDB nodes can keep stale privilege state.

🩹 Minimal fix
 func (do *Domain) decodePrivilegeEvent(resp clientv3.WatchResponse) PrivilegeEvent {
 	var msg PrivilegeEvent
 	isNewVersionEvents := false
 	for _, event := range resp.Events {
 		if event.Kv != nil {
 			val := event.Kv.Value
-			if len(val) > 0 {
-				var tmp PrivilegeEvent
-				err := json.Unmarshal(val, &tmp)
-				if err != nil {
-					logutil.BgLogger().Warn("decodePrivilegeEvent unmarshal fail", zap.Error(err))
-					break
-				}
-				isNewVersionEvents = true
-				if do.ServerID() != 0 && tmp.ServerID == do.ServerID() {
-					// Skip the events from this TiDB-Server
-					continue
-				}
-				if tmp.All {
-					msg.All = true
-					break
-				}
-				// duplicated users in list is ok.
-				msg.UserList = append(msg.UserList, tmp.UserList...)
-			}
+			if len(val) == 0 {
+				msg.All = true
+				break
+			}
+			var tmp PrivilegeEvent
+			err := json.Unmarshal(val, &tmp)
+			if err != nil {
+				logutil.BgLogger().Warn("decodePrivilegeEvent unmarshal fail", zap.Error(err))
+				break
+			}
+			isNewVersionEvents = true
+			if do.ServerID() != 0 && tmp.ServerID == do.ServerID() {
+				// Skip the events from this TiDB-Server
+				continue
+			}
+			if tmp.All {
+				msg.All = true
+				break
+			}
+			// duplicated users in list is ok.
+			msg.UserList = append(msg.UserList, tmp.UserList...)
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 1869 - 1903, In decodePrivilegeEvent,
treat any watch event with an empty event.Kv.Value as a legacy "full reload"
signal instead of waiting to see if all events lack JSON: inside the loop over
resp.Events in the Domain.decodePrivilegeEvent function, if event.Kv != nil and
len(event.Kv.Value) == 0 then set msg.All = true and return msg immediately (so
legacy empty-value events are not dropped when mixed with new JSON events);
remove reliance on isNewVersionEvents for this case so an empty-value event
always forces a full reload.

3021-3024: ⚠️ Potential issue | 🟠 Major

Downgrade oversized partial notifications to a full refresh event.

This branch only warns and still writes the oversized JSON. If etcd rejects it, only the local TiDB refreshes; logging the raw payload also dumps the full username list into logs.

🩹 Safer fallback
 		if uint64(len(data)) > size.MB {
-			logutil.BgLogger().Warn("notify update privilege message too large", zap.ByteString("value", data))
+			origBytes := len(data)
+			userCount := len(event.UserList)
+			event = PrivilegeEvent{All: true, ServerID: event.ServerID}
+			data, err = json.Marshal(event)
+			if err != nil {
+				return errors.Trace(err)
+			}
+			logutil.BgLogger().Warn("notify update privilege message too large, falling back to full reload",
+				zap.Int("user_count", userCount),
+				zap.Int("bytes", origBytes))
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 3021 - 3024, The code currently warns and
logs the raw oversized payload then attempts to write it to etcd; instead,
detect when uint64(len(data)) > size.MB and do not log the raw ByteString or
call ddlutil.PutKVToEtcd with the oversized JSON. Replace that branch to (1)
avoid dumping data to logs via logutil.BgLogger(), (2) write a safe fallback
value/key to etcd indicating a full refresh (e.g., a compact "full-refresh"
marker or an empty payload) using ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5,
privilegeKey, "<FULL_REFRESH>"), and (3) ensure the local code path still
triggers a full refresh locally. Update references in this block: size.MB,
logutil.BgLogger(), ddlutil.PutKVToEtcd, privilegeKey, do.ctx, do.etcdClient.
pkg/privilege/privileges/cache.go (1)

502-528: ⚠️ Potential issue | 🟠 Major

Still missing reverse default_roles dependents.

findUserAndAllRoles only expands through roleGraph. If a role is dropped or renamed, the affected mysql.default_roles rows belong to the grantees of that role, but those grantee usernames never enter this set, so their cached defaultRoles stay stale until the next full reload. Please either include reverse default_roles dependents here or force those callers down UpdateAll().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/privilege/privileges/cache.go` around lines 502 - 528,
findUserAndAllRoles currently only traverses roleGraph so usernames that depend
on a role via mysql.default_roles never get added, leaving defaultRoles cache
stale; update findUserAndAllRoles to also consider reverse default_roles
dependents (or accept an extra map param like reverseDefaultRoles mapping role
-> []grantee and BFS those grantees similarly), ensuring you add grantee
usernames into the all set and queue when their role appears, or alternatively
document and change callers (who expect defaultRoles updates) to call
UpdateAll() instead of this function; refer to findUserAndAllRoles, roleGraph,
defaultRoles, UpdateAll, and mysql.default_roles when making the change.
🧹 Nitpick comments (2)
pkg/sessionctx/variable/tidb_vars.go (1)

1032-1034: Consider clarifying the comment to explain the performance trade-off.

The current comment "decides whether tidb will load & update the whole user's data in-memory" is ambiguous about what enabling this variable actually does. Based on the PR context, when enabled, TiDB uses partial updates for the privilege cache instead of full reloads during user/privilege modifications, which reduces CPU and memory usage.

📝 Suggested documentation improvement
-	// TiDBAccelerateUserCreationUpdate decides whether tidb will load & update the whole user's data in-memory.
+	// TiDBAccelerateUserCreationUpdate enables partial updates for the privilege cache.
+	// When enabled, user/privilege modifications trigger targeted cache updates instead of full reloads,
+	// significantly reducing CPU and memory usage during frequent grant/revoke operations.
 	TiDBAccelerateUserCreationUpdate = "tidb_accelerate_user_creation_update"

As per coding guidelines: "Comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sessionctx/variable/tidb_vars.go` around lines 1032 - 1034, Update the
comment for the constant TiDBAccelerateUserCreationUpdate to clearly state its
behavior and performance trade-offs: explain that when enabled TiDB performs
partial updates to the privilege/user cache on user/privilege modifications
(avoiding full in-memory reloads), reducing CPU and memory use; when disabled it
falls back to full reloads of user data which can be simpler but more costly.
Mention the intended effect (less CPU/memory, faster incremental updates) and
any important compatibility/invariance (affects privilege cache update
strategy).
pkg/privilege/privileges/cache.go (1)

996-1088: Apply the same userList filter here as the other decoders.

When addUserFilterCondition skips the SQL predicate (len(userList)==0 or >1024), this closure still materializes every mysql.user row into diff.user; merge later discards most of it. That makes large-batch or accidental-empty partial refreshes much closer to a full user-table load than intended.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/privilege/privileges/cache.go` around lines 996 - 1088, The decoder
decodeUserTableRow must respect the incoming userList filter: after populating
value (or at least after calling value.assignUserOrHost) add a membership check
like if len(userList) > 0 { if _, ok := userList[value.User]; !ok { return nil }
} so rows for users not in userList are skipped and not appended into p.user;
update the closure in decodeUserTableRow to perform this early return before
computing/merging old := p.user.Get(...) and p.user.ReplaceOrInsert(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/executor/simple.go`:
- Around line 2424-2425: When s.IsDropRole is true you must capture the grantee
usernames before role-edge/default-role rows are deleted and pass them into the
NotifyUpdatePrivilege call; modify the flow around
userIdentityToUserList(s.UserList) so that when s.IsDropRole you gather the
grantees (e.g., from the data structure that will be used to remove role
edges/default roles) into a user list and merge/append those usernames with
userIdentityToUserList(s.UserList) before calling
domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(...); ensure the code references
s.IsDropRole, userIdentityToUserList, and NotifyUpdatePrivilege so the affected
grantees are notified prior to deleting mysql.role_edges/mysql.default_roles.
- Around line 2222-2227: The current notification only appends users from
s.UserToUsers (users.OldUser.Username and users.NewUser.Username) before calling
domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(userList); to fix, also collect
grantee users and any default-role/TO_USER owners affected by role renames and
append their usernames to the same userList; locate the rename handling around
s.UserToUsers and, for each renamed role entry, query the grant/default-role
owners (e.g., look up TO_USER owners or role grant records related to users who
had the old role) and ensure their usernames are added to userList prior to
calling NotifyUpdatePrivilege so their privilege cache is refreshed.

In `@pkg/privilege/privileges/cache.go`:
- Around line 531-567: loadSomeUsers currently calls loadTable directly for many
mysql.* tables and returns ErrNoSuchTable as fatal; change it to tolerate
missing privilege tables the same way UpdateAll does by either calling the
existing helper loaders (e.g.,
LoadDBTable/LoadTablePrivTable/LoadDefaultRoles/LoadColumnsPriv or whatever
"Load*" helpers exist) or by checking for errors.Is(err, ErrNoSuchTable) after
each loadTable call and ignoring that error for the specific SQL identifiers
sqlLoadDBTable, sqlLoadTablePrivTable, sqlLoadDefaultRoles,
sqlLoadColumnsPrivTable (and any role edges table used elsewhere); keep error
handling for other errors and for tables that must exist (sqlLoadUserTable,
sqlLoadGlobalPrivTable, sqlLoadGlobalGrantsTable), and use the same decode
functions (p.decodeDBTableRow, p.decodeTablesPrivTableRow,
p.decodeDefaultRoleTableRow, p.decodeColumnsPrivTableRow) so targeted refresh
via loadSomeUsers mirrors UpdateAll's non-fatal handling of missing privilege
tables.

---

Outside diff comments:
In `@pkg/executor/test/passwordtest/password_management_test.go`:
- Around line 919-928: The helper changeAutoLockedLastChanged currently calls
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege() and ignores its
error; modify the helper to surface and handle that failure—either change
changeAutoLockedLastChanged to return error and propagate it to the caller
(check and fail the test), or accept a *testing.T parameter and assert the call
succeeds (e.g., require.NoError/ t.Fatalf on failure). Ensure the updated
function signature and all callers are updated accordingly so
NotifyUpdateAllUsersPrivilege() errors are not silently swallowed.

---

Duplicate comments:
In `@pkg/domain/domain.go`:
- Around line 3015-3017: The code sets event.ServerID by directly reading
do.serverID which races with atomic stores; replace the direct field access with
the existing atomic accessor (call the Domain.ServerID() method or the atomic
load wrapper) so event.ServerID = do.ServerID() (or the accessor name in this
file) to ensure a safe atomic read; update the assignment in the block that
checks do.etcdClient != nil to use that accessor instead of do.serverID.
- Around line 1869-1903: In decodePrivilegeEvent, treat any watch event with an
empty event.Kv.Value as a legacy "full reload" signal instead of waiting to see
if all events lack JSON: inside the loop over resp.Events in the
Domain.decodePrivilegeEvent function, if event.Kv != nil and len(event.Kv.Value)
== 0 then set msg.All = true and return msg immediately (so legacy empty-value
events are not dropped when mixed with new JSON events); remove reliance on
isNewVersionEvents for this case so an empty-value event always forces a full
reload.
- Around line 3021-3024: The code currently warns and logs the raw oversized
payload then attempts to write it to etcd; instead, detect when
uint64(len(data)) > size.MB and do not log the raw ByteString or call
ddlutil.PutKVToEtcd with the oversized JSON. Replace that branch to (1) avoid
dumping data to logs via logutil.BgLogger(), (2) write a safe fallback value/key
to etcd indicating a full refresh (e.g., a compact "full-refresh" marker or an
empty payload) using ddlutil.PutKVToEtcd(do.ctx, do.etcdClient, 5, privilegeKey,
"<FULL_REFRESH>"), and (3) ensure the local code path still triggers a full
refresh locally. Update references in this block: size.MB, logutil.BgLogger(),
ddlutil.PutKVToEtcd, privilegeKey, do.ctx, do.etcdClient.

In `@pkg/privilege/privileges/cache.go`:
- Around line 502-528: findUserAndAllRoles currently only traverses roleGraph so
usernames that depend on a role via mysql.default_roles never get added, leaving
defaultRoles cache stale; update findUserAndAllRoles to also consider reverse
default_roles dependents (or accept an extra map param like reverseDefaultRoles
mapping role -> []grantee and BFS those grantees similarly), ensuring you add
grantee usernames into the all set and queue when their role appears, or
alternatively document and change callers (who expect defaultRoles updates) to
call UpdateAll() instead of this function; refer to findUserAndAllRoles,
roleGraph, defaultRoles, UpdateAll, and mysql.default_roles when making the
change.

---

Nitpick comments:
In `@pkg/privilege/privileges/cache.go`:
- Around line 996-1088: The decoder decodeUserTableRow must respect the incoming
userList filter: after populating value (or at least after calling
value.assignUserOrHost) add a membership check like if len(userList) > 0 { if _,
ok := userList[value.User]; !ok { return nil } } so rows for users not in
userList are skipped and not appended into p.user; update the closure in
decodeUserTableRow to perform this early return before computing/merging old :=
p.user.Get(...) and p.user.ReplaceOrInsert(...).

In `@pkg/sessionctx/variable/tidb_vars.go`:
- Around line 1032-1034: Update the comment for the constant
TiDBAccelerateUserCreationUpdate to clearly state its behavior and performance
trade-offs: explain that when enabled TiDB performs partial updates to the
privilege/user cache on user/privilege modifications (avoiding full in-memory
reloads), reducing CPU and memory use; when disabled it falls back to full
reloads of user data which can be simpler but more costly. Mention the intended
effect (less CPU/memory, faster incremental updates) and any important
compatibility/invariance (affects privilege cache update strategy).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 487e0889-3d45-4bf9-a3be-66b82cb2bd8d

📥 Commits

Reviewing files that changed from the base of the PR and between 60c9344 and a0647dd.

📒 Files selected for processing (29)
  • br/pkg/restore/snap_client/pipeline_items.go
  • br/pkg/restore/snap_client/systable_restore.go
  • br/pkg/restore/snap_client/systable_restore_test.go
  • pkg/domain/BUILD.bazel
  • pkg/domain/domain.go
  • pkg/executor/grant.go
  • pkg/executor/infoschema_reader_test.go
  • pkg/executor/revoke.go
  • pkg/executor/show.go
  • pkg/executor/simple.go
  • pkg/executor/test/passwordtest/password_management_test.go
  • pkg/extension/auth_test.go
  • pkg/privilege/privilege.go
  • pkg/privilege/privileges/BUILD.bazel
  • pkg/privilege/privileges/cache.go
  • pkg/privilege/privileges/cache_test.go
  • pkg/privilege/privileges/privileges.go
  • pkg/privilege/privileges/privileges_test.go
  • pkg/privilege/privileges/tidb_auth_token_test.go
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/session/bootstrap.go
  • pkg/session/bootstraptest/bootstrap_upgrade_test.go
  • pkg/session/session.go
  • pkg/sessionctx/variable/sysvar.go
  • pkg/sessionctx/variable/tidb_vars.go
  • tests/integrationtest/r/explain.result
  • tests/integrationtest/r/privilege/privileges.result
  • tests/integrationtest/t/explain.test
  • tests/integrationtest/t/privilege/privileges.test
✅ Files skipped from review due to trivial changes (1)
  • tests/integrationtest/t/explain.test
🚧 Files skipped from review as they are similar to previous changes (17)
  • tests/integrationtest/r/privilege/privileges.result
  • pkg/privilege/privileges/privileges_test.go
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/extension/auth_test.go
  • pkg/executor/show.go
  • pkg/executor/grant.go
  • pkg/sessionctx/variable/sysvar.go
  • pkg/domain/BUILD.bazel
  • pkg/privilege/privileges/BUILD.bazel
  • pkg/executor/revoke.go
  • pkg/executor/infoschema_reader_test.go
  • tests/integrationtest/r/explain.result
  • pkg/privilege/privilege.go
  • pkg/session/session.go
  • br/pkg/restore/snap_client/systable_restore.go
  • tests/integrationtest/t/privilege/privileges.test
  • pkg/session/bootstraptest/bootstrap_upgrade_test.go

Comment thread pkg/executor/simple.go
Comment thread pkg/executor/simple.go
Comment on lines +531 to +567
func (p *MySQLPrivilege) loadSomeUsers(ctx sqlexec.SQLExecutor, userList map[string]struct{}) error {
err := loadTable(ctx, addUserFilterCondition(sqlLoadUserTable, userList), p.decodeUserTableRow(userList))
if err != nil {
return errors.Trace(err)
}

err = loadTable(ctx, addUserFilterCondition(sqlLoadGlobalPrivTable, userList), p.decodeGlobalPrivTableRow(userList))
if err != nil {
return errors.Trace(err)
}

err = loadTable(ctx, addUserFilterCondition(sqlLoadGlobalGrantsTable, userList), p.decodeGlobalGrantsTableRow(userList))
if err != nil {
return errors.Trace(err)
}

err = loadTable(ctx, addUserFilterCondition(sqlLoadDBTable, userList), p.decodeDBTableRow(userList))
if err != nil {
return errors.Trace(err)
}

err = loadTable(ctx, addUserFilterCondition(sqlLoadTablePrivTable, userList), p.decodeTablesPrivTableRow(userList))
if err != nil {
return errors.Trace(err)
}

err = loadTable(ctx, addUserFilterCondition(sqlLoadDefaultRoles, userList), p.decodeDefaultRoleTableRow(userList))
if err != nil {
return errors.Trace(err)
}

err = loadTable(ctx, addUserFilterCondition(sqlLoadColumnsPrivTable, userList), p.decodeColumnsPrivTableRow(userList))
if err != nil {
return errors.Trace(err)
}

return nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep partial refresh tolerant of missing privilege tables.

UpdateAll() explicitly treats missing mysql.db, mysql.tables_priv, mysql.default_roles, mysql.columns_priv, and mysql.role_edges as non-fatal, but updateUsers()/loadSomeUsers() now hard-fail on the same ErrNoSuchTable cases because they call loadTable directly. With tidb_accelerate_user_creation_update enabled, targeted refresh can fail in environments where full refresh still works. Reuse the Load* helpers or mirror the existing noSuchTable branches here.

Also applies to: 2297-2321

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/privilege/privileges/cache.go` around lines 531 - 567, loadSomeUsers
currently calls loadTable directly for many mysql.* tables and returns
ErrNoSuchTable as fatal; change it to tolerate missing privilege tables the same
way UpdateAll does by either calling the existing helper loaders (e.g.,
LoadDBTable/LoadTablePrivTable/LoadDefaultRoles/LoadColumnsPriv or whatever
"Load*" helpers exist) or by checking for errors.Is(err, ErrNoSuchTable) after
each loadTable call and ignoring that error for the specific SQL identifiers
sqlLoadDBTable, sqlLoadTablePrivTable, sqlLoadDefaultRoles,
sqlLoadColumnsPrivTable (and any role edges table used elsewhere); keep error
handling for other errors and for tables that must exist (sqlLoadUserTable,
sqlLoadGlobalPrivTable, sqlLoadGlobalGrantsTable), and use the same decode
functions (p.decodeDBTableRow, p.decodeTablesPrivTableRow,
p.decodeDefaultRoleTableRow, p.decodeColumnsPrivTableRow) so targeted refresh
via loadSomeUsers mirrors UpdateAll's non-fatal handling of missing privilege
tables.

@codecov

codecov Bot commented Mar 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 74.14051% with 173 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-8.5@5182594). Learn more about missing BASE report.

Additional details and impacted files
@@               Coverage Diff                @@
##             release-8.5     #66541   +/-   ##
================================================
  Coverage               ?   55.5309%           
================================================
  Files                  ?       1819           
  Lines                  ?     653279           
  Branches               ?          0           
================================================
  Hits                   ?     362772           
  Misses                 ?     263571           
  Partials               ?      26936           
Flag Coverage Δ
integration 39.2090% <53.8116%> (?)
unit 64.9575% <67.8624%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.9954% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 64.0393% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@fzzf678 fzzf678 changed the title privilege: introduce 'partial update' for users and privileges cache | tidb-test=pr/2568 (#62693) privilege: introduce 'partial update' for users and privileges cache (#62693) | tidb-test=pr/2700 Mar 11, 2026
@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678 fzzf678 force-pushed the cherry-pick-62693-to-release-8.5 branch from a0647dd to c87733c Compare March 11, 2026 03:26

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
br/pkg/restore/snap_client/systable_restore.go (1)

389-397: ⚠️ Potential issue | 🟠 Major

Refresh privileges for every restored privilege table, not just mysql.user.

mysql.db, tables_priv, columns_priv, default_roles, role_edges, global_priv, and global_grants also feed the privilege cache. Restoring one of those without a refresh leaves running TiDBs on stale auth data until the periodic reload.

💡 Minimal fix
-		if table == "user" {
+		if _, ok := sysPrivilegeTableMap[table]; ok {
 			if serr := rc.dom.NotifyUpdateAllUsersPrivilege(); serr != nil {
 				log.Warn("failed to flush privileges, please manually execute `FLUSH PRIVILEGES`")
 				err = multierr.Append(err, berrors.ErrUnknown.Wrap(serr).GenWithStack("failed to flush privileges"))
 			} else {
 				log.Info("privilege system table restored, please reconnect to make it effective")
 			}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@br/pkg/restore/snap_client/systable_restore.go` around lines 389 - 397, The
current loop only calls rc.dom.NotifyUpdateAllUsersPrivilege() when table ==
"user"; update the condition to call NotifyUpdateAllUsersPrivilege() for all
privilege-related tables (e.g., "user", "db", "tables_priv", "columns_priv",
"default_roles", "role_edges", "global_priv", "global_grants", and any other
MySQL privilege tables used in this codebase) so that after restoring any of
those tables you run the refresh; locate the loop over tables and replace the
single-table check with a membership test against this privilege-table set and
reuse rc.dom.NotifyUpdateAllUsersPrivilege() and the existing error
handling/logging branches (keeping the log messages but making them apply to any
privilege table restore).
pkg/executor/test/passwordtest/password_management_test.go (1)

919-927: ⚠️ Potential issue | 🟡 Minor

Don't ignore the privilege-refresh error in this helper.

If NotifyUpdateAllUsersPrivilege() fails here, the following auth assertions run against stale cached privileges and the test failure becomes misleading. This helper should surface the error the same way the other changed call sites do.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/test/passwordtest/password_management_test.go` around lines 919
- 927, The helper changeAutoLockedLastChanged currently ignores the return value
of domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege(); change it to
capture the error (err :=
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege()) and surface it
the same way other call sites do (fail the test immediately on error — e.g. call
the test failure helper used elsewhere such as require.NoError(t, err) or
t.Fatalf("NotifyUpdateAllUsersPrivilege failed: %v", err) / equivalent via
testkit), so the test stops on a privilege-refresh failure instead of proceeding
with stale privileges.
♻️ Duplicate comments (7)
pkg/executor/infoschema_reader_test.go (1)

714-715: ⚠️ Potential issue | 🟡 Minor

Avoid another hard-coded global row count.

Line 715 still makes this test depend on the current total rows in information_schema.tidb_index_usage, so unrelated bootstrap/index additions will keep breaking it. Capture the baseline count before creating idt1/idt2, then assert baseCnt + 6 here instead. As per coding guidelines, "Keep test changes minimal and deterministic; avoid broad golden/testdata churn unless required".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/infoschema_reader_test.go` around lines 714 - 715, The test
currently asserts a hard-coded total from information_schema.tidb_index_usage;
instead capture the baseline row count into a variable (e.g. baseCnt) by running
tk.MustQuery(`select count(*) from information_schema.tidb_index_usage;`) before
creating idt1/idt2, then after creating them replace the literal check with an
assertion that the count equals baseCnt + 6; update the check at the spot
referencing information_schema.tidb_index_usage to use baseCnt + 6 so the test
is deterministic and immune to unrelated bootstrap/index additions.
pkg/domain/domain.go (3)

1872-1900: ⚠️ Potential issue | 🟠 Major

Treat any empty-value watch event as a full refresh.

A rolling upgrade can still deliver a mixed watch batch containing both legacy empty events and new JSON events. This code only falls back to All=true when every event is legacy, so the legacy change is dropped and other TiDB nodes can keep stale privilege state.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 1872 - 1900, The loop over resp.Events
must treat any empty-value watch event as a full refresh: inside the resp.Events
iteration in the code that unmarshals into PrivilegeEvent, detect when
event.Kv.Value is empty (len(val)==0) and immediately set msg.All = true and
break out of processing (or continue to skip further per-event processing) so
that mixed batches with legacy empty events trigger a full reload; keep existing
checks for tmp.All and the skip for events from the same server (do.ServerID())
intact.

3021-3024: ⚠️ Potential issue | 🟠 Major

Fall back to a full-refresh event when the payload is oversized.

Sending a >1 MiB partial-update payload to etcd can fail, leaving only the local TiDB refreshed. Logging the raw payload also dumps the entire username list into logs. Downgrade to {All:true} and log only counts/sizes before the etcd write.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 3021 - 3024, When the serialized
partial-update payload (variable data) exceeds 1 MiB, avoid sending the
oversized payload and logging its raw contents; instead log only safe metrics
(e.g., count of users and payload size) and replace the etcd write with a
compact full-refresh marker (e.g., a JSON with All:true) before calling
ddlutil.PutKVToEtcd; update the code around privilegeKey/ddlutil.PutKVToEtcd
(using do.ctx and do.etcdClient) to detect uint64(len(data)) > size.MB, log
counts/sizes via logutil.BgLogger().Warn without the raw byte payload, set data
to the serialized full-refresh message, then call ddlutil.PutKVToEtcd as before.

3015-3016: ⚠️ Potential issue | 🟠 Major

Read serverID via the atomic accessor.

serverID is written with atomic stores elsewhere, so the direct read here mixes atomic and non-atomic access and can misclassify self-emitted events.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/domain/domain.go` around lines 3015 - 3016, The code reads do.serverID
directly when assigning event.ServerID inside the do.etcdClient != nil block,
mixing non-atomic reads with atomic writes; change the direct access to use the
existing atomic accessor for serverID (i.e., replace event.ServerID =
do.serverID with a call to the atomic getter for serverID such as the package's
atomic.LoadUint64(&do.serverID) or the repository's provided
do.GetServerID()/LoadServerID() method) so the read is done atomically and
self-emitted events are classified correctly.
pkg/executor/simple.go (2)

2424-2425: ⚠️ Potential issue | 🔴 Critical

Capture affected grantees before DROP ROLE deletes the evidence.

In the s.IsDropRole path, notifying only the dropped role names is too late. The users who lost that role are the ones whose privileges changed, but their mysql.role_edges / mysql.default_roles rows have already been deleted, so partial refresh can no longer discover them. Snapshot those grantee/default-role owner usernames before deletion and include them in NotifyUpdatePrivilege(...).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/simple.go` around lines 2424 - 2425, In the s.IsDropRole branch
you must snapshot the grantee/default-role owner usernames before the DROP ROLE
mutates mysql.role_edges/default_roles so the partial refresh can find them;
collect those affected usernames (e.g., build an affectedUsers list from the
current role edges/default-role owners) prior to performing the delete and pass
that list into domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(...) instead of
(or in addition to) the post-delete userList derived from
userIdentityToUserList(s.UserList), ensuring NotifyUpdatePrivilege receives the
pre-delete grantee usernames.

2222-2227: ⚠️ Potential issue | 🟠 Major

Include grantees/default-role owners when this path renames a role.

This notification only includes the old/new role names. When the renamed account is a role, users granted that role also changed effective privileges/default-role metadata, so their partial cache stays stale until a full reload. Append the affected TO_USER / default-role owner usernames to NotifyUpdatePrivilege(...) as well.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/simple.go` around lines 2222 - 2227, The current
NotifyUpdatePrivilege call only passes renamed role usernames
(users.OldUser.Username and users.NewUser.Username) but misses users who are
grantees or default-role owners affected by a role rename; update the code in
the s.UserToUsers loop (and the surrounding scope that builds userList) to also
collect and append the usernames of TO_USER/default-role owners for any entry
where the subject is a role (check the role marker on
users.OldUser/users.NewUser or the relevant grantee list on the mapping object),
then pass that combined list to
domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(userList) so grantees' partial
caches are refreshed too.
pkg/privilege/privileges/cache.go (1)

531-567: ⚠️ Potential issue | 🟠 Major

Keep partial refresh as tolerant as UpdateAll().

UpdateAll() treats missing mysql.db, mysql.tables_priv, mysql.default_roles, mysql.columns_priv, and mysql.role_edges as non-fatal, but the new targeted path hard-fails on the same ErrNoSuchTable cases because it calls loadTable(...) directly. That means enabling partial refresh can break privilege reloads in environments where a full reload still succeeds. Mirror the existing noSuchTable(...) handling here or route through the existing Load* helpers.

Also applies to: 2297-2318

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/privilege/privileges/cache.go` around lines 531 - 567, The
partial-refresh path in loadSomeUsers calls loadTable(...) directly and treats
ErrNoSuchTable as fatal; mirror UpdateAll() behavior by making loadSomeUsers
tolerate missing privilege tables: after each loadTable call (especially for
sqlLoadDBTable, sqlLoadTablePrivTable, sqlLoadDefaultRoles,
sqlLoadColumnsPrivTable and any role-related tables) check the error with
noSuchTable(err) (or the same helper used elsewhere) and if true, ignore the
error and continue, otherwise return errors.Trace(err); alternatively, replace
the direct loadTable(...) calls with the existing
LoadDB/LoadTablesPriv/LoadDefaultRoles/LoadColumnsPriv helpers that already
handle ErrNoSuchTable to preserve the non-fatal behavior.
🧹 Nitpick comments (3)
pkg/session/bootstraptest/bootstrap_upgrade_test.go (1)

463-463: Minor: Extraneous leading space in SQL query.

The SQL string has a leading space which, while harmless, is inconsistent with other SQL strings in the file.

Additionally, limiting to 20 jobs assumes the target job is among the most recent DDL jobs. In high-concurrency test environments, this could cause flaky failures if more than 20 DDL jobs are queued between job creation and this check.

Suggested fix
-	sql := fmt.Sprintf(" admin show ddl jobs 20 where job_id=%d", jobID)
+	sql := fmt.Sprintf("admin show ddl jobs 20 where job_id=%d", jobID)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/session/bootstraptest/bootstrap_upgrade_test.go` at line 463, Remove the
extraneous leading space in the sql string and avoid the arbitrary "20" limit
which can cause flakiness; update the construction of the sql variable
(currently: sql := fmt.Sprintf(" admin show ddl jobs 20 where job_id=%d",
jobID)) to produce a query without the leading space and either omit the
job-count limit or use a sufficiently large/unbounded form (e.g. "admin show ddl
jobs where job_id=%d" or add a much larger limit) so the lookup for jobID is
deterministic under high concurrency.
pkg/sessionctx/variable/sysvar.go (1)

3557-3562: Add a short intent comment for this rollout flag.

The setter is fine, but the knob’s operational contract is not obvious from the registration alone. A one-line comment clarifying that this gates the partial privilege/user cache update path would make future maintenance easier.

♻️ Suggested tweak
+	// TiDBAccelerateUserCreationUpdate enables the partial privilege/user cache update path.
 	{Scope: ScopeGlobal, Name: TiDBAccelerateUserCreationUpdate, Value: BoolToOnOff(DefTiDBAccelerateUserCreationUpdate), Type: TypeBool,
 		SetGlobal: func(_ context.Context, s *SessionVars, val string) error {
 			AccelerateUserCreationUpdate.Store(TiDBOptOn(val))
 			return nil
 		},
 	},

As per coding guidelines, comments SHOULD explain non-obvious intent, constraints, invariants, concurrency guarantees, SQL/compatibility contracts, or important performance trade-offs, and SHOULD NOT restate what the code already makes clear.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/sessionctx/variable/sysvar.go` around lines 3557 - 3562, Add a one-line
intent comment above the TiDBAccelerateUserCreationUpdate registration
explaining that this rollout flag gates the partial privilege/user-cache update
path (i.e., when enabled the system will use AccelerateUserCreationUpdate.Store
to toggle the optimized incremental update behavior for user/privilege changes);
place the comment adjacent to the registration block so future readers of the
TiDBAccelerateUserCreationUpdate and the SetGlobal setter can immediately see
the operational contract and concurrency/compatibility implication.
pkg/executor/show.go (1)

1796-1802: Keep SHOW CREATE USER decoupled from mysql.user storage details.

Pulling authentication_string directly from mysql.user makes the executor own privilege-table layout and account lookup rules. A small privilege-layer accessor or account snapshot would keep future auth-storage changes localized.

Also applies to: 1875-1876

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/executor/show.go` around lines 1796 - 1802, The SHOW CREATE USER
implementation currently reads authentication_string directly via
ExecRestrictedSQL against mysql.SystemDB and mysql.UserTable (selecting
authentication_string), coupling executor to privilege-table layout; replace
this direct query with the privilege-layer accessor or account snapshot API used
elsewhere (i.e., call the account lookup/getter that returns the
Account/Authentication info) so SHOW CREATE USER consumes the abstraction rather
than reading authentication_string directly; update both occurrences (the
ExecRestrictedSQL call and the similar select at the other location) to use the
accessor method and map its returned auth field into the SHOW CREATE USER
output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@br/pkg/restore/snap_client/systable_restore.go`:
- Around line 389-397: The current loop only calls
rc.dom.NotifyUpdateAllUsersPrivilege() when table == "user"; update the
condition to call NotifyUpdateAllUsersPrivilege() for all privilege-related
tables (e.g., "user", "db", "tables_priv", "columns_priv", "default_roles",
"role_edges", "global_priv", "global_grants", and any other MySQL privilege
tables used in this codebase) so that after restoring any of those tables you
run the refresh; locate the loop over tables and replace the single-table check
with a membership test against this privilege-table set and reuse
rc.dom.NotifyUpdateAllUsersPrivilege() and the existing error handling/logging
branches (keeping the log messages but making them apply to any privilege table
restore).

In `@pkg/executor/test/passwordtest/password_management_test.go`:
- Around line 919-927: The helper changeAutoLockedLastChanged currently ignores
the return value of
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege(); change it to
capture the error (err :=
domain.GetDomain(tk.Session()).NotifyUpdateAllUsersPrivilege()) and surface it
the same way other call sites do (fail the test immediately on error — e.g. call
the test failure helper used elsewhere such as require.NoError(t, err) or
t.Fatalf("NotifyUpdateAllUsersPrivilege failed: %v", err) / equivalent via
testkit), so the test stops on a privilege-refresh failure instead of proceeding
with stale privileges.

---

Duplicate comments:
In `@pkg/domain/domain.go`:
- Around line 1872-1900: The loop over resp.Events must treat any empty-value
watch event as a full refresh: inside the resp.Events iteration in the code that
unmarshals into PrivilegeEvent, detect when event.Kv.Value is empty
(len(val)==0) and immediately set msg.All = true and break out of processing (or
continue to skip further per-event processing) so that mixed batches with legacy
empty events trigger a full reload; keep existing checks for tmp.All and the
skip for events from the same server (do.ServerID()) intact.
- Around line 3021-3024: When the serialized partial-update payload (variable
data) exceeds 1 MiB, avoid sending the oversized payload and logging its raw
contents; instead log only safe metrics (e.g., count of users and payload size)
and replace the etcd write with a compact full-refresh marker (e.g., a JSON with
All:true) before calling ddlutil.PutKVToEtcd; update the code around
privilegeKey/ddlutil.PutKVToEtcd (using do.ctx and do.etcdClient) to detect
uint64(len(data)) > size.MB, log counts/sizes via logutil.BgLogger().Warn
without the raw byte payload, set data to the serialized full-refresh message,
then call ddlutil.PutKVToEtcd as before.
- Around line 3015-3016: The code reads do.serverID directly when assigning
event.ServerID inside the do.etcdClient != nil block, mixing non-atomic reads
with atomic writes; change the direct access to use the existing atomic accessor
for serverID (i.e., replace event.ServerID = do.serverID with a call to the
atomic getter for serverID such as the package's atomic.LoadUint64(&do.serverID)
or the repository's provided do.GetServerID()/LoadServerID() method) so the read
is done atomically and self-emitted events are classified correctly.

In `@pkg/executor/infoschema_reader_test.go`:
- Around line 714-715: The test currently asserts a hard-coded total from
information_schema.tidb_index_usage; instead capture the baseline row count into
a variable (e.g. baseCnt) by running tk.MustQuery(`select count(*) from
information_schema.tidb_index_usage;`) before creating idt1/idt2, then after
creating them replace the literal check with an assertion that the count equals
baseCnt + 6; update the check at the spot referencing
information_schema.tidb_index_usage to use baseCnt + 6 so the test is
deterministic and immune to unrelated bootstrap/index additions.

In `@pkg/executor/simple.go`:
- Around line 2424-2425: In the s.IsDropRole branch you must snapshot the
grantee/default-role owner usernames before the DROP ROLE mutates
mysql.role_edges/default_roles so the partial refresh can find them; collect
those affected usernames (e.g., build an affectedUsers list from the current
role edges/default-role owners) prior to performing the delete and pass that
list into domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(...) instead of (or in
addition to) the post-delete userList derived from
userIdentityToUserList(s.UserList), ensuring NotifyUpdatePrivilege receives the
pre-delete grantee usernames.
- Around line 2222-2227: The current NotifyUpdatePrivilege call only passes
renamed role usernames (users.OldUser.Username and users.NewUser.Username) but
misses users who are grantees or default-role owners affected by a role rename;
update the code in the s.UserToUsers loop (and the surrounding scope that builds
userList) to also collect and append the usernames of TO_USER/default-role
owners for any entry where the subject is a role (check the role marker on
users.OldUser/users.NewUser or the relevant grantee list on the mapping object),
then pass that combined list to
domain.GetDomain(e.Ctx()).NotifyUpdatePrivilege(userList) so grantees' partial
caches are refreshed too.

In `@pkg/privilege/privileges/cache.go`:
- Around line 531-567: The partial-refresh path in loadSomeUsers calls
loadTable(...) directly and treats ErrNoSuchTable as fatal; mirror UpdateAll()
behavior by making loadSomeUsers tolerate missing privilege tables: after each
loadTable call (especially for sqlLoadDBTable, sqlLoadTablePrivTable,
sqlLoadDefaultRoles, sqlLoadColumnsPrivTable and any role-related tables) check
the error with noSuchTable(err) (or the same helper used elsewhere) and if true,
ignore the error and continue, otherwise return errors.Trace(err);
alternatively, replace the direct loadTable(...) calls with the existing
LoadDB/LoadTablesPriv/LoadDefaultRoles/LoadColumnsPriv helpers that already
handle ErrNoSuchTable to preserve the non-fatal behavior.

---

Nitpick comments:
In `@pkg/executor/show.go`:
- Around line 1796-1802: The SHOW CREATE USER implementation currently reads
authentication_string directly via ExecRestrictedSQL against mysql.SystemDB and
mysql.UserTable (selecting authentication_string), coupling executor to
privilege-table layout; replace this direct query with the privilege-layer
accessor or account snapshot API used elsewhere (i.e., call the account
lookup/getter that returns the Account/Authentication info) so SHOW CREATE USER
consumes the abstraction rather than reading authentication_string directly;
update both occurrences (the ExecRestrictedSQL call and the similar select at
the other location) to use the accessor method and map its returned auth field
into the SHOW CREATE USER output.

In `@pkg/session/bootstraptest/bootstrap_upgrade_test.go`:
- Line 463: Remove the extraneous leading space in the sql string and avoid the
arbitrary "20" limit which can cause flakiness; update the construction of the
sql variable (currently: sql := fmt.Sprintf(" admin show ddl jobs 20 where
job_id=%d", jobID)) to produce a query without the leading space and either omit
the job-count limit or use a sufficiently large/unbounded form (e.g. "admin show
ddl jobs where job_id=%d" or add a much larger limit) so the lookup for jobID is
deterministic under high concurrency.

In `@pkg/sessionctx/variable/sysvar.go`:
- Around line 3557-3562: Add a one-line intent comment above the
TiDBAccelerateUserCreationUpdate registration explaining that this rollout flag
gates the partial privilege/user-cache update path (i.e., when enabled the
system will use AccelerateUserCreationUpdate.Store to toggle the optimized
incremental update behavior for user/privilege changes); place the comment
adjacent to the registration block so future readers of the
TiDBAccelerateUserCreationUpdate and the SetGlobal setter can immediately see
the operational contract and concurrency/compatibility implication.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7fe7cb0f-9ef4-4764-a3c1-e2212a56efb9

📥 Commits

Reviewing files that changed from the base of the PR and between a0647dd and c87733c.

📒 Files selected for processing (29)
  • br/pkg/restore/snap_client/pipeline_items.go
  • br/pkg/restore/snap_client/systable_restore.go
  • br/pkg/restore/snap_client/systable_restore_test.go
  • pkg/domain/BUILD.bazel
  • pkg/domain/domain.go
  • pkg/executor/grant.go
  • pkg/executor/infoschema_reader_test.go
  • pkg/executor/revoke.go
  • pkg/executor/show.go
  • pkg/executor/simple.go
  • pkg/executor/test/passwordtest/password_management_test.go
  • pkg/extension/auth_test.go
  • pkg/privilege/privilege.go
  • pkg/privilege/privileges/BUILD.bazel
  • pkg/privilege/privileges/cache.go
  • pkg/privilege/privileges/cache_test.go
  • pkg/privilege/privileges/privileges.go
  • pkg/privilege/privileges/privileges_test.go
  • pkg/privilege/privileges/tidb_auth_token_test.go
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/session/bootstrap.go
  • pkg/session/bootstraptest/bootstrap_upgrade_test.go
  • pkg/session/session.go
  • pkg/sessionctx/variable/sysvar.go
  • pkg/sessionctx/variable/tidb_vars.go
  • tests/integrationtest/r/explain.result
  • tests/integrationtest/r/privilege/privileges.result
  • tests/integrationtest/t/explain.test
  • tests/integrationtest/t/privilege/privileges.test
🚧 Files skipped from review as they are similar to previous changes (8)
  • pkg/extension/auth_test.go
  • pkg/domain/BUILD.bazel
  • tests/integrationtest/t/explain.test
  • br/pkg/restore/snap_client/pipeline_items.go
  • pkg/privilege/privileges/BUILD.bazel
  • pkg/server/tests/commontest/tidb_test.go
  • pkg/executor/revoke.go
  • pkg/privilege/privileges/privileges_test.go

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

2 similar comments
@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@ti-chi-bot

ti-chi-bot Bot commented Mar 11, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wjhuang2016, yudongusa, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot Bot added the approved label Mar 11, 2026
@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678 fzzf678 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 11, 2026
@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

1 similar comment
@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

8 similar comments
@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@hawkingrei

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@fzzf678

fzzf678 commented Mar 11, 2026

Copy link
Copy Markdown
Contributor

/retest

@tiprow

tiprow Bot commented Mar 11, 2026

Copy link
Copy Markdown

@ti-chi-bot: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
fast_test_tiprow_for_release 3c1f93a link true /test fast_test_tiprow_for_release

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved cherry-pick-approved Cherry pick PR approved by release team. lgtm release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants