Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: add throttling for background GC operations based on store health #57248

Closed
sumeerbhola opened this issue Nov 30, 2020 · 3 comments
Closed
Labels
A-admission-control A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@sumeerbhola
Copy link
Collaborator

sumeerbhola commented Nov 30, 2020

There is no throttling of GC based on store health, and we've seen situations where removal of the protected timestamp due to cancellation of stuck backups has caused a GC spike and overloaded the LSM store of a few nodes. This required significant manual intervention to restore the cluster to a healthy state, and customer unhappiness.

We should be throttling the proposals generated by the gcQueue based on the health of all the replica stores of a range. There is a concern that too much throttling could itself tip the stores into a different form of unhealthiness, with too many versions of a key. I think it is ok to set the default throttling to allow for moderate overload, like it is for ingestDelayL0Threshold (which is used when adding sstables).

This issue relates to #57247 , which also needs a store health signal for all replicas.

Jira issue: CRDB-2846

@sumeerbhola sumeerbhola added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-kv-replication Relating to Raft, consensus, and coordination. labels Nov 30, 2020
@nvanbenschoten
Copy link
Member

How much of this would be needed if we get to a point where GC is performed through compaction as opposed to point operations? I ask because making decisions "based on the health of all the replica stores of a range" seems difficult, especially as our primary defense mechanism, seems hard to get right. We not only need to formalize what "health" means in this context, but we also need to deal with staleness of this information. And then we need to worry about potential fallout due to starved GC, as you mention.

I ask all this because I'm interested in how much of this we can push off if we instead make another pass at dramatically reducing the cost of MVCC GC through a change like #42514. Do you have thoughts about that?

@sumeerbhola
Copy link
Collaborator Author

How much of this would be needed if we get to a point where GC is performed through compaction as opposed to point operations?

Maybe not needed then, though we will still need a fallback GC path that works as now (though probably won't find much garbage) and will need to run at each replica (so can throttle itself based on local information). But I don't know when it will be worked on. It seems complicated and there are a number of technical details still to be worked out as outlined in #57260

I ask because making decisions "based on the health of all the replica stores of a range" seems difficult, especially as our primary defense mechanism, seems hard to get right. We not only need to formalize what "health" means in this context, but we also need to deal with staleness of this information.

I was expecting something very simple wrt health -- say bytes in L0 and Lbase relative to the rest of the LSM (we would also converge this signal with whatever we currently use for import).
Regarding staleness, what is the staleness for our current gossip? My understanding is that these LSMs get unhealthy over the timescale of many minutes (or longer).

@sumeerbhola
Copy link
Collaborator Author

We have replication admission control which applies to elastic traffic (which this is), and better compaction heuristics for *DELs etc., so closing this as done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control A-kv-replication Relating to Raft, consensus, and coordination. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

No branches or pull requests

3 participants