Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default tikv_gc_life_time is too short #8896

Closed
morgo opened this issue Dec 31, 2018 · 6 comments
Closed

default tikv_gc_life_time is too short #8896

morgo opened this issue Dec 31, 2018 · 6 comments
Labels
type/enhancement The issue or PR belongs to an enhancement.

Comments

@morgo
Copy link
Contributor

morgo commented Dec 31, 2018

Feature Request

Is your feature request related to a problem? Please describe:

The default tikv_gc_life_time is 10 minutes, which means that a tidb_snapshot has a risk of being cleaned up and causing an error.

Since this feature is used for backup consistency, it means that a backup that takes longer than 10 minutes to run is at risk of failing.

Edit: Here is the error from running a mydumper backup on a 80GB database (compressed to ~20GB) in TiDB:

morgo@ryzen:/mnt/evo970/tmp$ mydumper

** (mydumper:12087): CRITICAL **: Could not read data from ontime.ontime: GC life time is shorter than transaction duration, transaction starts at 2019-01-01 09:24:16.831 -0700 MST, GC safe point is 2019-01-01 09:33:31.381 -0700 MST

Describe the feature you'd like:

I think a more reasonable default (while still on the conservative end) would be 4hrs.

Describe alternatives you've considered:

24 hours would be even better, as it is the example used here: https://pingcap.com/docs/op-guide/gc/

Teachability, Documentation, Adoption, Migration Strategy:

Simple defaults change. Very straight forward.

@morgo morgo added the type/enhancement The issue or PR belongs to an enhancement. label Dec 31, 2018
@morgo
Copy link
Contributor Author

morgo commented Dec 31, 2018

PTAL @kennytm

@shenli
Copy link
Member

shenli commented Jan 1, 2019

If there are too many garbage in TiKV, the read performance would be slow down. 24 hours would be too long for OLTP scenario. @zhangjinpeng1987 how do you think about this?

@gregwebs
Copy link
Contributor

gregwebs commented Jan 1, 2019

Is there a way to only GC up to active snapshot usage?

You would still need an additional setting to timeout a backup that is accidentally taking too long. So you can have tikv_gc_life_time of ten minutes and tikv_gc_delay_for_transaction of 4 hours. Normal GC would be 10 minutes, but it would extend to 4 hours for a backup.

With these solutions there is a potential problem if the backup process gets killed before completion. The GC may kick-in before you restart the backup. So you may need a more explicit way to lock a snapshot in time.

There is a potential ability to run GC on versions not in use by the open tidb_snapshot. That is, if you have a key with 5 stored revisions, and tidb_snapshot uses revision 5, you could still GC revision 2-4.

@morgo
Copy link
Contributor Author

morgo commented Jan 1, 2019

I've updated the description to include the mydumper error message.

@zhangjinpeng87
Copy link
Contributor

zhangjinpeng87 commented Jan 11, 2019

@morgo You can enlarge the gc_life_time before running backup, and change it back after the backup work finished. If we enlarge the default gc_life_time, there is a risk that there are too many old versions that may slowdown some queries.

@morgo
Copy link
Contributor Author

morgo commented Jan 23, 2019

This will be fixed via #9161

I am going to close this issue, since if backup locks are implemented, the default tikv_gc_life_time is no longer too short.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

4 participants