Description
Richard Loveland (rmloveland) commented:
In the DELETE
docs we have a section on preserving DELETE
performance over time which needs to be updated. It provides the user with the option to take
one of the following approaches:
- At each iteration, update the WHERE clause to filter only the rows that have not yet been marked for deletion. For an example, see Batch-delete on an indexed filter.
- At each iteration, first use a SELECT statement to return primary key values on rows that are not yet deleted. Rows marked for deletion will not be returned. Then, use a nested DELETE loop over a smaller batch size, filtering on the primary key values. For an example, see Batch delete on a non-indexed column.
- To iteratively delete rows in constant time, using a simple DELETE loop, you can alter your zone configuration and change gc.ttlseconds to a low value like 5 minutes (i.e., 300), and then run your DELETE statement once per GC interval.
However, we have users running into this problem via a support issue who have apparently tried (one of? all of?) these approaches and not found satisfaction.
Therefore we need to update this page to make it clearer. In particular, given that this is still a known behavior of CockroachDB when scanning over tombstones according to cockroachdb/cockroach#17229, perhaps none of these actions will actually help? Also it's possible that providing the "choose your own adventure" of three choices is less helpful than it could be, since users will end up having to try all three.
Estimated scope of work:
- Seek out an updated recommendation from Eng folks on what users should do here; ideally, it will be one recommendation, since it's unlikely that users can be reasonably expected to try all three and see what happens, esp in prod
- Determine if, in fact, this is just a known limitation and we should make weaker claims in this doc about how to "fix it", since maybe it cannot be fixed via something we can say in docs
See also:
- Added "Why are my deletes getting slower?" FAQ #4490
- Explain why iterative deletes can get slower over time #4865
Jira Issue: DOC-1128