-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kvserver: add UseClearRange option to GCRequest_GCKey #61455
Comments
Note that we also have a more far-reaching proposal on the table, which is moving GC to the storage layer: #57260 Using ClearRange in "today's GC" might be useful as an interim step. We discussed perf impact on GC recently within KV and were unsure how "bad" GC is at the moment. That is, we had no good intuition of whether it is currently responsible for performance blips, etc, in the wild. We do have the |
I've never really understood how #57260 interacts with mvcc stats. Is the idea that we still read the data to evaluate the stats impact but then lay down some tombstone which will apply lazily to reclaim space? |
The stats are indeed the "hard" part there. In a world without stats, I think the solution would simply be that we determine a GCThreshold, and bump that. And we say that versions covered by the GCThreshold cease to be part of the replicated state, i.e. they get omitted by snapshots and it's pebble jobs to remove them from any replica they linger on over time. But of course then a) we have to recompute stats when sending snapshots b) pebble has to somehow.. emit stats updates to us when it compacts.. and the stats are nonlocal (can't determine stats contributions of a single KV pair, you need to know whether it covers something, or is itself live, etc). #57260 essentially says "do something entirely new" or "recompute stats after GC". I'm not sure what that "entirely new" would be, exactly. Maybe it is what you're saying. |
cockroachdb/pebble#1295 should probably be a prerequisite. |
#51230 uses a clearrange to remove many versions of a key. However, we also need a variant that uses clearrange to remove a long run of keys with no visible versions that all fell below the GC threshold at the same time. Typically because of a DeleteRange or something, especially with MVCC range tombstones (#70414). |
cc @cockroachdb/replication |
This seems hard in that it'll run afoul of all the efforts to make MVCC GC latchless. We may need a new primitive which provides a to ensure that new writes don't accidentally get caught under this |
Yeah, that bit's annoying. However, we'll need this to keep e.g. schema GC reasonably performant once it's moved to MVCC. Let's keep this issue specifically about removing many versions of a single key (which we can do latchlessly), and discuss over on #70414. |
90830: gc: use clear range operations to remove multiple garbage keys across keys and versions r=erikgrinaker a=aliher1911 Summary: GC Will track consecutive keys and issue GCClearRange requests to remove multiple keys at once if it can find more than configured number of keys. This covers multiple versions of the same key as well as multiple consecutive keys. On the GC side it will count consecutive keys while building "points batches" e.g. batches containing MVCC keys to delete. Those batches are capped to fit into resulting raft entries once deletion is executed. GC could accumulate more than a single batch in memory in that case. Once necessary number of consecutive keys is found, GCClearRange is issued and pending batches are discarded. If non gc-able data is found, pending batches are GC'd and clear range tracking is restarted. To protect GC from consuming too much memory for example if it collects a range with very large keys, it uses a configrable memory limit on how many key bytes could be held in pending batches. If this limit is reached before GC reaches minimum desired number of keys, oldest batch is GC'd and starting point for clear range is adjusted. On the evaluation side, GC requests for ranges will obtain write latches removed range to prevent other requests writing in the middle of the range. However, if only a single key is touched i.e. multiple versions are deleted, then no latches are obtained as it is done when deleting individual versions as all the removed data is below the GC threshold. GC exposes a metric `queue.gc.info.numclearconsecutivekeys` that shows how many times GCClearRange was used. GC adds new system setting `kv.gc.clear_range_min_keys` that sets minimum number of keys eligible for GCClearRange. ---- batcheval: handle ClearSubRange request fields in GC requests Handle ClearSubRange GC request fields to use GCClearRange storage functions for ranges of keys that contain no visible data instead of removing each key individually. This PR only enables processing of request field, while actual GC and storage functionality is added in previous PRs. Release note: None ---- gc: add version gate to GC clear range requests ClearRange fields of GCRequests are ignored by earlier versions, this feature needs to be version gated to let GC work when talking to a previous version if GC is run from a different node. Release note: None ---- gc: add metrics for clear range requests in GC This commit adds two metrics for clear range functionality in GC: queue.gc.info.numclearconsecutivekeys is incremented every time GC sends a ClearRange request for a bunch of consecutive keys to a replica Release note (ops change): Metric queue.gc.info.numclearconsecutivekeys added. It exposes number of clear range requests to remove consecutive keys issued by mvcc garbage collection. ---- gc: GC use delete_range_keys for consecutive ranges of keys When deleting large chunks of keys usually produced by range tombstones GC will use point deletion for all versions found within replica key range. That kind of operations would generate unnecessary load on the storage because each individual version must be deleted independently. This patch adds an ability for GC to find ranges of consecutive keys which are not interleaved with any newer data and create delete_range_key GC requests to remove them. Release note: None ---- rditer: helper to visit multiple replica key spans IterateMVCCReplicaKeySpans helper to complement IterateplicaKeySpans Existing tests moved from ReplicaMVCCDataIterator to IterateMVCCReplicaKeySpans as former is being sunsetted and only a single call site exists. Release note: None ---- storage: add gc operation that uses clear range Deleting multiple versions of same key or continuous ranges of point keys with pebble tombstones could be inefficient when range has lots of garbage. This pr adds a storage gc operation that uses pebble range tombstones to remove data. Release note: None ---- gc: add delete range parameters to gc request This commit adds clear_sub_range_key parameters to GC request. clear_sub_range_keys is used by GC queue to remove chunks of consecutive keys where no new data was written above the GC threshold and storage can optimize deletions with range tombstones. To support new types of keys, GCer interface is also updated to pass provided keys down to request. Release note: None Fixes: #84560, #61455 93732: dev: refactor `doctor` r=healthy-pod a=rickystewart Create a new interactive `dev doctor` experience, so you don't have to copy-paste commands from `dev doctor` and it can solve problems for you. Now `dev doctor` will be interactive by default and will ask for user input to confirm the actions it's going to take. You can also set `--autofix` to automatically agree to some default actions without having to interactive. Interactive doctor can be disabled with `--interactive=false`. Epic: none Release note: None 93814: workload: fix TPC-H generator to generate correct values for p_name and ps_supplycost r=rytaft a=rytaft **workload: fix TPC-H generator to generate correct values for p_name** The TPC-H generator creates values for `p_name` in the `part` table by selecting 5 elements from a list of words called `randPartNames`. It is supposed to select 5 random unique elements from the full list, but prior to this commit, it only selected a permutation of the first 5 elements. This commit fixes the logic and adds a unit test. Epic: None Release note: None **workload: fix TPC-H generator value of ps_supplycost** Prior to this commit, the TPC-H generator was generating a value between 0 and 10 for `ps_supplycost` in the `partsupp` table. However, the spec calls for a random value in the range `[1.00 .. 1,000.00]`. This commit fixes the oversight. Epic: None Release note: None 93852: build-and-publish-patched-go: upgrade from 1.19.1 to 1.19.4 r=ajwerner a=ajwerner This picks up a mix of security and compiler fixes. The compiler fixes I'm interested in relate to generic data structures, but the other fixes seem worthwhile too. * [x] Adjust the Pebble tests to run in new version. * [x] Adjust version in Docker image ([source](./builder/Dockerfile)). * [x] Adjust version in the TeamCity agent image ([setup script](./packer/teamcity-agent.sh)) * [x] Rebuild and push the Docker image (following [Basic Process](#basic-process)) * [x] Update `build/teamcity/internal/release/build-and-publish-patched-go/impl.sh` with the new version and adjust SHA256 sums as necessary. * [x] Run the `Internal / Release / Build and Publish Patched Go` build configuration in TeamCity with your latest version of the script above. This will print out the new URL's and SHA256 sums for the patched Go that you built above. * [x] Bump the version in `WORKSPACE` under `go_download_sdk`. You may need to bump [rules_go](https://github.com/bazelbuild/rules_go/releases). Also edit the filenames listed in `sdks` and update all the hashes to match what you built in the step above. * [x] Run `./dev generate bazel` to refresh `distdir_files.bzl`, then `bazel fetch `@distdir//:archives`` to ensure you've updated all hashes to the correct value. * [x] Bump the version in `builder.sh` accordingly ([source](./builder.sh#L6)). * [x] Bump the version in `go-version-check.sh` ([source](./go-version-check.sh)), unless bumping to a new patch release. * [x] Bump the go version in `go.mod`. * [x] Bump the default installed version of Go in `bootstrap-debian.sh` ([source](./bootstrap/bootstrap-debian.sh)). * [x] Replace other mentions of the older version of go (grep for `golang:<old_version>` and `go<old_version>`). * [ ] Update the `builder.dockerImage` parameter in the TeamCity [`Cockroach`](https://teamcity.cockroachdb.com/admin/editProject.html?projectId=Cockroach&tab=projectParams) and [`Internal`](https://teamcity.cockroachdb.com/admin/editProject.html?projectId=Internal&tab=projectParams) projects. * [ ] Ask the Developer Infrastructure team to deploy new TeamCity agent images according to [packer/README.md](./packer/README.md) Epic: None Release note: None 93914: ui: refresh nodes on tenants r=matthewtodd a=matthewtodd Part of #89949 Now that we can show meaningful region information for tenants, we need to actually trigger the fetching of that information. Release note: None Co-authored-by: Oleg Afanasyev <oleg@cockroachlabs.com> Co-authored-by: Ricky Stewart <rickybstewart@gmail.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com> Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com> Co-authored-by: Matthew Todd <todd@cockroachlabs.com>
Fixed by #90830. |
There is an open PR by @ajwerner that would need dedicated attention to land. It (dramatically) improves the ease with which we can GC large version histories of a key.
The reason the PR is held up is uncertainty about the possible negative impact of many range deletion tombstones. This is a concern that needs to be investigated.
This issue tracks the future of that PR, a job previously done by (now closed) #50194.
#51230
Jira issue: CRDB-3015
The text was updated successfully, but these errors were encountered: