Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util/memory: warn potential deadlock for Consume in remove (#16987) #18394

Merged
merged 6 commits into from
Jul 20, 2020

Conversation

ti-srebot
Copy link
Contributor

cherry-pick #16987 to release-3.1


What problem does this PR solve?

close #16944

Problem Summary:

What is changed and how it works?

What's Changed:
Add a new func consumeNegative() which basically does the same thing as Consume(),
but it only takes negative value to ensure that no exceed action is triggered.
So the exceed checking is removed.
Replace the original Consume() in remove() with consumeNegative() because if exceed action is triggered in Consume() called by remove(), then deadlock (one double lock and two conflicting lock order cases) will happen.

How it Works:
Exceed Action in Consume() called by remove() will cause deadlock.
Besides the double lock in #16944,
I also find two conflicting lock order cases.
Both are related to Tracker.mu.Lock().

Case 1:

One mutex is LogOnExceed.mutex.Lock():

type LogOnExceed struct {
mutex sync.Mutex // For synchronization.
acted bool
ConnID uint64
logHook func(uint64)
}

The other mutex is LogOnExceed.mutex.Lock():
type Tracker struct {
mu struct {
sync.Mutex
// The children memory trackers. If the Tracker is the Global Tracker, like executor.GlobalDiskUsageTracker,
// we wouldn't maintain its children in order to avoiding mutex contention.
children []*Tracker
}

  1. LogOnExceed.mutex.Lock() -> Tracker.mu.Lock()
    LogOnExceed.mutex.Lock(), t.String():
    func (a *LogOnExceed) Action(t *Tracker) {
    a.mutex.Lock()
    defer a.mutex.Unlock()
    if !a.acted {
    a.acted = true
    if a.logHook == nil {
    logutil.BgLogger().Warn("memory exceeds quota",
    zap.Error(errMemExceedThreshold.GenWithStackByArgs(t.label, t.BytesConsumed(), t.bytesLimit, t.String())))

    Then t.String() calls t.toString().
    Tracker.mu.Lock():

    tidb/util/memory/tracker.go

    Lines 258 to 265 in 2daee41

    func (t *Tracker) toString(indent string, buffer *bytes.Buffer) {
    fmt.Fprintf(buffer, "%s\"%s\"{\n", indent, t.label)
    if t.bytesLimit > 0 {
    fmt.Fprintf(buffer, "%s \"quota\": %s\n", indent, t.BytesToString(t.bytesLimit))
    }
    fmt.Fprintf(buffer, "%s \"consumed\": %s\n", indent, t.BytesToString(t.BytesConsumed()))
    t.mu.Lock()
  2. Tracker.mu.Lock() -> LogOnExceed.mutex.Lock()
    Tracker.mu.Lock(), t.Consume():

    tidb/util/memory/tracker.go

    Lines 155 to 163 in 2daee41

    func (t *Tracker) remove(oldChild *Tracker) {
    t.mu.Lock()
    defer t.mu.Unlock()
    for i, child := range t.mu.children {
    if child != oldChild {
    continue
    }
    t.Consume(-oldChild.BytesConsumed())

    tidb/util/memory/tracker.go

    Lines 252 to 254 in 2daee41

    func (t *Tracker) String() string {
    buffer := bytes.NewBufferString("\n")
    t.toString("", buffer)
        var rootExceed *Tracker
	for tracker := t
        rootExceed = tracker

There is a path that leads to rootExceed == t.
If that is the case, Action(rootExceed) is equal to Action(t).

rootExceed.actionMu.actionOnExceed.Action(rootExceed)

Then Action() calls LogOnExceed.mutex.Lock().

func (a *LogOnExceed) Action(t *Tracker) {
a.mutex.Lock()
defer a.mutex.Unlock()

Case 2:

Similar to Case 1.

  1. Tracker.actionMu.Lock() -> Tracker.mu.Lock()
    Consume() calls rootExceed.actionMu.Lock() and Action().
    Action() calls String(). Then toString(). Then t.mu.Lock().
  2. Tracker.mu.Lock() -> Tracker.actionMu.Lock()
    remove() calls t.mu.Lock() and t.Consume().
    Then t.Consume() calls rootExceed.actionMu.Lock().

#16944 suggests that we add a new version of Consume() without checking exceeding.
I agree with this because it warns developers to ensure the input cannot cause exceeding in Consume() called by remove(). Otherwise, a deadlock may happen.
This will help prevent future double-lock in #16944 and the conflicting locks in this PR.

Related changes

  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test
  • Integration test

Side effects

No

Release note

  • util/memory: warn potential deadlock for Consume in remove

Signed-off-by: ti-srebot <ti-srebot@pingcap.com>
@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@SunRunAway, @XuHuaiyu, @Yisaer, @wshwsh12, PTAL.

2 similar comments
@ti-srebot
Copy link
Contributor Author

@SunRunAway, @XuHuaiyu, @Yisaer, @wshwsh12, PTAL.

@ti-srebot
Copy link
Contributor Author

@SunRunAway, @XuHuaiyu, @Yisaer, @wshwsh12, PTAL.

Copy link
Contributor

@wshwsh12 wshwsh12 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot ti-srebot added the status/LGT1 Indicates that a PR has LGTM 1. label Jul 15, 2020
@ti-srebot
Copy link
Contributor Author

@wshwsh12, @SunRunAway, @XuHuaiyu, @Yisaer, PTAL.

1 similar comment
@ti-srebot
Copy link
Contributor Author

@wshwsh12, @SunRunAway, @XuHuaiyu, @Yisaer, PTAL.

Copy link
Contributor

@Yisaer Yisaer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-srebot
Copy link
Contributor Author

@Yisaer,Thanks for your review. However, LGTM is restricted to Reviewers or higher roles.See the corresponding SIG page for more information. Related SIGs: execution(slack).

Copy link
Contributor

@SunRunAway SunRunAway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/merge

@ti-srebot ti-srebot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jul 20, 2020
@SunRunAway
Copy link
Contributor

/merge

@ti-srebot ti-srebot added the status/can-merge Indicates a PR has been approved by a committer. label Jul 20, 2020
@ti-srebot
Copy link
Contributor Author

Your auto merge job has been accepted, waiting for:

  • 17584
  • 16729
  • 15828

@SunRunAway
Copy link
Contributor

/merge

@ti-srebot
Copy link
Contributor Author

Your auto merge job has been accepted, waiting for:

  • 15828

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@ti-srebot merge failed.

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot
Copy link
Contributor Author

@ti-srebot merge failed.

@SunRunAway
Copy link
Contributor

/merge

@ti-srebot
Copy link
Contributor Author

Your auto merge job has been accepted, waiting for:

  • 17584
  • 16729
  • 15828

@ti-srebot
Copy link
Contributor Author

/run-all-tests

@ti-srebot ti-srebot merged commit 2c2e308 into pingcap:release-3.1 Jul 20, 2020
@SunRunAway SunRunAway deleted the release-3.1-a9177fe846bf branch July 20, 2020 09:10
@jackysp jackysp removed the contribution This PR is from a community contributor. label Jul 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/execution SIG execution status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2. type/3.1-cherry-pick
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants