Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

store/tikv: Continue GC when error happens during UnsafeDestroyRange #10743

Merged

Conversation

MyonKeminta
Copy link
Contributor

Signed-off-by: MyonKeminta MyonKeminta@users.noreply.github.com

What problem does this PR solve?

This PR makes GC continues working when sending some UnsafeDestroyRange requests returns error.

Also, a metrics is added to monitor failures of UnsafeDestroyRanges

What is changed and how it works?

(see above)

Check List

Tests

  • Unit test

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the tidb-ansible repository

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@codecov
Copy link

codecov bot commented Jun 9, 2019

Codecov Report

Merging #10743 into master will decrease coverage by 0.0224%.
The diff coverage is 0%.

@@               Coverage Diff                @@
##             master     #10743        +/-   ##
================================================
- Coverage   79.5906%   79.5682%   -0.0225%     
================================================
  Files           415        415                
  Lines         88131      88142        +11     
================================================
- Hits          70144      70133        -11     
- Misses        12801      12824        +23     
+ Partials       5186       5185         -1

@codecov
Copy link

codecov bot commented Jun 9, 2019

Codecov Report

Merging #10743 into master will increase coverage by 0.0678%.
The diff coverage is 69.5652%.

@@               Coverage Diff                @@
##             master     #10743        +/-   ##
================================================
+ Coverage   81.0069%   81.0748%   +0.0678%     
================================================
  Files           419        419                
  Lines         88859      88776        -83     
================================================
- Hits          71982      71975         -7     
+ Misses        11648      11585        -63     
+ Partials       5229       5216        -13

@shenli
Copy link
Member

shenli commented Jun 10, 2019

There is no unit test in this PR.

@@ -67,4 +67,12 @@ var (
Name: "gc_region_too_many_locks",
Help: "Counter of gc scan lock request more than once in the same region.",
})

GCUnsafeDestroyRangeFailuresCounter = prometheus.NewCounter(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to register it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@MyonKeminta
Copy link
Contributor Author

@shenli @jackysp PTAL again

Copy link
Contributor

@zhangjinpeng87 zhangjinpeng87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

MyonKeminta and others added 2 commits June 12, 2019 18:03
@MyonKeminta
Copy link
Contributor Author

/run-all-tests

@MyonKeminta MyonKeminta changed the title store/tikv: Continue working when error happens during UnsafeDestroyRange store/tikv: Continue GC when error happens during UnsafeDestroyRange Jun 13, 2019
@MyonKeminta
Copy link
Contributor Author

It seems some tests failed.

@MyonKeminta
Copy link
Contributor Author

/run-unit-test

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@MyonKeminta MyonKeminta force-pushed the misono/fix-unsafe-destroy-range branch from f9c50d4 to 2f89696 Compare June 13, 2019 11:53
@MyonKeminta
Copy link
Contributor Author

/run-all-tests

@MyonKeminta
Copy link
Contributor Author

/run-all-tests

@MyonKeminta
Copy link
Contributor Author

/run-all-tests
@zhangjinpeng1987 @disksing PTAL

@MyonKeminta
Copy link
Contributor Author

/run-mybatis-test
/run-integration-ddl-test

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@MyonKeminta
Copy link
Contributor Author

/run-all-tests

disksing
disksing previously approved these changes Jun 17, 2019
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@MyonKeminta
Copy link
Contributor Author

/run-unit-test

wg.Wait()

return errors.Trace(err)
if err != nil {
return errors.New("[gc worker] destroy range failed on some stores")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to user errors.Annotatef to keep the original error message.

Copy link
Contributor Author

@MyonKeminta MyonKeminta Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors were already logged so I thought it's ok to drop them... And there may be multiple regions returned from above, so I think collect them into an []string and use errors.Errorf. It's that ok?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error log will be scattered throughout the file in the current implementation. I think we should collect all error messages in a GC job and log them after the job finishes.

@MyonKeminta
Copy link
Contributor Author

@disksing PTAL again thanks!

@MyonKeminta
Copy link
Contributor Author

Oh, sorry, please wait a minute

}

se := createSession(w.store)
err = util.CompleteDeleteRange(se, r)
se.Close()
if err != nil {
return errors.Trace(err)
logutil.Logger(ctx).Info("[gc worker] failed to mark delete range task done",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the GCUnsafeDestroyRangeFailuresCounter to the type CounterVec and add a label (e.g: "send" and "save") to indicate the stage of failure occurred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
@MyonKeminta
Copy link
Contributor Author

MyonKeminta commented Jun 18, 2019

@lonng PTAL again. Is that ok now?

Copy link
Contributor

@lonng lonng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MyonKeminta
Copy link
Contributor Author

/run-all-tests

@MyonKeminta
Copy link
Contributor Author

@disksing PTAL thanks!

@disksing
Copy link
Contributor

LGTM

@MyonKeminta MyonKeminta added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 20, 2019
@MyonKeminta
Copy link
Contributor Author

@disksing Could you please give me a approve?

@MyonKeminta MyonKeminta merged commit 697ba85 into pingcap:master Jun 21, 2019
@MyonKeminta MyonKeminta deleted the misono/fix-unsafe-destroy-range branch June 21, 2019 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/GC status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants