Skip to content

Conversation

@fzzf678
Copy link

@fzzf678 fzzf678 commented Jan 22, 2026

Fixes #1852.

KVStore.Close() previously closed the PD client before closing RegionCache. RegionCache has background tasks (store list refresh / store re-resolve) that may still call PD APIs (GetStore/GetAllStores). When PD client is closed first, those background calls fail and emit noisy logs like:

  • loadStore from PD failed: rpc error: code = Canceled desc = grpc: the client connection is closing

This PR reorders KVStore.Close() to close RegionCache (and other dependents) before closing the PD clients, avoiding the shutdown window error.

Summary by CodeRabbit

  • Chores

    • Improved shutdown sequencing and resource cleanup so caches are closed before dependent clients, reducing shutdown races.
  • Public API

    • Added a lightweight introspection method to report background-runner state (intended for tests/debugging).
  • Tests

    • Added an automated test and runtime check to verify shutdown ordering and ensure the region cache is closed before client shutdown.

✏️ Tip: You can customize this high-level summary in your review settings.

@ti-chi-bot ti-chi-bot bot added contribution This PR is from a community contributor. dco-signoff: no Indicates the PR's author has not signed dco. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. labels Jan 22, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 22, 2026

Welcome @fzzf678!

It looks like this is your first PR to tikv/client-go 🎉.

I'm the bot to help you request reviewers, add labels and more, See available commands.

We want to make sure your contribution gets all the attention it needs!



Thank you, and welcome to tikv/client-go. 😃

@ti-chi-bot ti-chi-bot bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 22, 2026
@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Reorders KVStore shutdown: txn latches closed earlier, region cache closed before PD clients (with a failpoint-checked runtime panic if not), PD HTTP then PD gRPC clients closed, then TiKV clients. Adds RegionCache.IsBackgroundRunnerClosed and a test validating close ordering.

Changes

Cohort / File(s) Summary
KVStore shutdown & test
tikv/kv.go, tikv/kv_test.go
Reordered KVStore.Close() to close txn latches early, call regionCache.Close() before closing PD clients, close pdHttpClient then pdClient, then TiKV clients. Added failpoint check in Close and test TestKVStoreCloseCheckRegionCacheClosedBeforePDClose.
RegionCache helper
internal/locate/region_cache.go
Added func (c *RegionCache) IsBackgroundRunnerClosed() bool to expose whether the RegionCache background runner has been canceled (for tests/debugging).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant KV as KVStore
  participant TL as TxnLatches
  participant RC as RegionCache
  participant PHTTP as PD_HTTP_Client
  participant PGRPC as PD_gRPC_Client
  participant TK as TiKV_Client

  rect rgba(200,230,255,0.5)
    KV->>TL: Close() (if present)
    TL-->>KV: closed
  end

  rect rgba(200,255,200,0.5)
    KV->>RC: Close() (cancel background runner)
    RC-->>KV: background goroutines stopped
  end

  rect rgba(255,230,200,0.5)
    KV->>PHTTP: Close()
    PHTTP-->>KV: closed
    KV->>PGRPC: Close()
    PGRPC-->>KV: closed
  end

  rect rgba(240,200,255,0.5)
    KV->>TK: Close()
    TK-->>KV: closed
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through shutdown, neat and quick,
Latches first, then cache—no trick,
PD doors closed in proper line,
TiKV settles, quiet and fine,
A carrot nod to order, slick.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'tikv: close region cache before PD client' directly describes the main change—reordering shutdown sequence to close RegionCache before PD clients.
Linked Issues check ✅ Passed The PR implements all requirements from #1852: reorders KVStore.Close() to close RegionCache before PD clients, adds runtime validation, and includes test coverage for the fix.
Out of Scope Changes check ✅ Passed All changes directly address the linked issue: kv.go reorders shutdown sequence, region_cache.go adds debugging method, kv_test.go adds validation test. No unrelated changes present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: fzzf678 <daery_7@163.com>
@fzzf678 fzzf678 force-pushed the fix_kvstore_close_order branch from 58eae61 to 3b25743 Compare January 22, 2026 06:56
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. dco-signoff: no Indicates the PR's author has not signed dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed dco-signoff: no Indicates the PR's author has not signed dco. dco-signoff: yes Indicates the PR's author has signed the dco. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 22, 2026
tikv/kv.go Outdated
}
s.regionCache.Close()
if _, err := util.EvalFailpoint("checkRegionCacheClosedBeforePDClose"); err == nil && retErr == nil && !s.regionCache.IsBackgroundRunnerClosed() {
retErr = errors.New("region cache is not closed before closing pd client")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we panic here? seems easier to understand and the panic in failpoint does not affect the production code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update in 62192c0

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 26, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 26, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-26 08:07:35.120988485 +0000 UTC m=+999682.734945341: ☑️ agreed by lcwangchao.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jan 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lcwangchao
Once this PR has been reviewed and has the lgtm label, please assign iosmanthus for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@lcwangchao
Copy link
Contributor

lcwangchao commented Jan 26, 2026

Please repair the DCO with git commit --amend -s

@fzzf678 fzzf678 force-pushed the fix_kvstore_close_order branch from 62192c0 to 17efe31 Compare January 26, 2026 08:34
tikv: add regression test for KVStore.Close order

Revert "tikv: add regression test for KVStore.Close order"

This reverts commit 40b5491714d16650a9aa73e8d5a8e2313df9656c.

Signed-off-by: fzzf678 <daery_7@163.com>
Signed-off-by: fzzf678 <daery_7@163.com>
@fzzf678 fzzf678 force-pushed the fix_kvstore_close_order branch from 17efe31 to a45885f Compare January 26, 2026 08:52
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. and removed dco-signoff: no Indicates the PR's author has not signed dco. labels Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contribution This PR is from a community contributor. dco-signoff: yes Indicates the PR's author has signed the dco. first-time-contributor Indicates that the PR was contributed by an external member and is a first-time contributor. needs-1-more-lgtm Indicates a PR needs 1 more LGTM. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KVStore.Close closes pdClient before RegionCache, may causing spurious loadStore from PD failed (grpc: the client connection is closing) during shutdown

2 participants