-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
store/tikv: Continue GC when error happens during UnsafeDestroyRange #10743
store/tikv: Continue GC when error happens during UnsafeDestroyRange #10743
Conversation
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #10743 +/- ##
================================================
- Coverage 79.5906% 79.5682% -0.0225%
================================================
Files 415 415
Lines 88131 88142 +11
================================================
- Hits 70144 70133 -11
- Misses 12801 12824 +23
+ Partials 5186 5185 -1 |
Codecov Report
@@ Coverage Diff @@
## master #10743 +/- ##
================================================
+ Coverage 81.0069% 81.0748% +0.0678%
================================================
Files 419 419
Lines 88859 88776 -83
================================================
- Hits 71982 71975 -7
+ Misses 11648 11585 -63
+ Partials 5229 5216 -13 |
There is no unit test in this PR. |
metrics/gc_worker.go
Outdated
@@ -67,4 +67,12 @@ var ( | |||
Name: "gc_region_too_many_locks", | |||
Help: "Counter of gc scan lock request more than once in the same region.", | |||
}) | |||
|
|||
GCUnsafeDestroyRangeFailuresCounter = prometheus.NewCounter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to register it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
/run-all-tests |
It seems some tests failed. |
/run-unit-test |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
f9c50d4
to
2f89696
Compare
/run-all-tests |
/run-all-tests |
/run-all-tests |
/run-mybatis-test |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
…MyonKeminta/tidb into misono/fix-unsafe-destroy-range
/run-all-tests |
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
/run-unit-test |
store/tikv/gcworker/gc_worker.go
Outdated
wg.Wait() | ||
|
||
return errors.Trace(err) | ||
if err != nil { | ||
return errors.New("[gc worker] destroy range failed on some stores") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer to user errors.Annotatef
to keep the original error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These errors were already logged so I thought it's ok to drop them... And there may be multiple regions returned from above, so I think collect them into an []string
and use errors.Errorf
. It's that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error log will be scattered throughout the file in the current implementation. I think we should collect all error messages in a GC job and log them after the job finishes.
@disksing PTAL again thanks! |
Oh, sorry, please wait a minute |
store/tikv/gcworker/gc_worker.go
Outdated
} | ||
|
||
se := createSession(w.store) | ||
err = util.CompleteDeleteRange(se, r) | ||
se.Close() | ||
if err != nil { | ||
return errors.Trace(err) | ||
logutil.Logger(ctx).Info("[gc worker] failed to mark delete range task done", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change the GCUnsafeDestroyRangeFailuresCounter
to the type CounterVec
and add a label (e.g: "send" and "save") to indicate the stage of failure occurred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
…MyonKeminta/tidb into misono/fix-unsafe-destroy-range
@lonng PTAL again. Is that ok now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…fix-unsafe-destroy-range
/run-all-tests |
@disksing PTAL thanks! |
LGTM |
@disksing Could you please give me a approve? |
Signed-off-by: MyonKeminta MyonKeminta@users.noreply.github.com
What problem does this PR solve?
This PR makes GC continues working when sending some UnsafeDestroyRange requests returns error.
Also, a metrics is added to monitor failures of UnsafeDestroyRanges
What is changed and how it works?
(see above)
Check List
Tests
Related changes
tidb-ansible
repository